Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Preface
What Is Data Science?
Who Is This Book For?
Why Python?
Python 2 Versus Python 3
Outline of This Book
Using Code Examples
Installation Considerations
Conventions Used in This Book
O’Reilly Safari
How to Contact Us
1. IPython: Beyond Normal Python
Shell or Notebook?
Launching the IPython Shell
Launching the Jupyter Notebook
Help and Documentation in IPython
Accessing Documentation with ?
Accessing Source Code with ??
Exploring Modules with Tab Completion
Tab completion of object contents
Tab completion when importing
Beyond tab completion: Wildcard matching
Keyboard Shortcuts in the IPython Shell
Navigation Shortcuts
Text Entry Shortcuts
Command History Shortcuts
Miscellaneous Shortcuts
IPython Magic Commands
Pasting Code Blocks: %paste and %cpaste
Running External Code: %run
Timing Code Execution: %timeit
Help on Magic Functions: ?, %magic, and %lsmagic
Input and Output History
IPython’s In and Out Objects
Underscore Shortcuts and Previous Outputs
Suppressing Output
Related Magic Commands
IPython and Shell Commands
Quick Introduction to the Shell
Shell Commands in IPython
Passing Values to and from the Shell
Shell-Related Magic Commands
Errors and Debugging
Controlling Exceptions: %xmode
Debugging: When Reading Tracebacks Is Not Enough
Partial list of debugging commands
Profiling and Timing Code
Timing Code Snippets: %timeit and %time
Profiling Full Scripts: %prun
Line-by-Line Profiling with %lprun
Profiling Memory Use: %memit and %mprun
More IPython Resources
Web Resources
Books
2. Introduction to NumPy
Understanding Data Types in Python
A Python Integer Is More Than Just an Integer
A Python List Is More Than Just a List
Fixed-Type Arrays in Python
Creating Arrays from Python Lists
Creating Arrays from Scratch
NumPy Standard Data Types
The Basics of NumPy Arrays
NumPy Array Attributes
Array Indexing: Accessing Single Elements
Array Slicing: Accessing Subarrays
One-dimensional subarrays
Multidimensional subarrays
Accessing array rows and columns
Subarrays as no-copy views
Creating copies of arrays
Reshaping of Arrays
Array Concatenation and Splitting
Concatenation of arrays
Splitting of arrays
Computation on NumPy Arrays: Universal Functions
The Slowness of Loops
Introducing UFuncs
Exploring NumPy’s UFuncs
Array arithmetic
Absolute value
Trigonometric functions
Exponents and logarithms
Specialized ufuncs
Advanced Ufunc Features
Specifying output
Aggregates
Outer products
Ufuncs: Learning More
Aggregations: Min, Max, and Everything in Between
Summing the Values in an Array
Minimum and Maximum
Multidimensional aggregates
Other aggregation functions
Example: What Is the Average Height of US Presidents?
Computation on Arrays: Broadcasting
Introducing Broadcasting
Rules of Broadcasting
Broadcasting example 1
Broadcasting example 2
Broadcasting example 3
Broadcasting in Practice
Centering an array
Plotting a two-dimensional function
Comparisons, Masks, and Boolean Logic
Example: Counting Rainy Days
Digging into the data
Comparison Operators as ufuncs
Working with Boolean Arrays
Counting entries
Boolean operators
Boolean Arrays as Masks
Fancy Indexing
Exploring Fancy Indexing
Combined Indexing
Example: Selecting Random Points
Modifying Values with Fancy Indexing
Example: Binning Data
Sorting Arrays
Fast Sorting in NumPy: np.sort and np.argsort
Sorting along rows or columns
Partial Sorts: Partitioning
Example: k-Nearest Neighbors
Structured Data: NumPy’s Structured Arrays
Creating Structured Arrays
More Advanced Compound Types
RecordArrays: Structured Arrays with a Twist
On to Pandas
3. Data Manipulation with Pandas
Installing and Using Pandas
Introducing Pandas Objects
The Pandas Series Object
Series as generalized NumPy array
Series as specialized dictionary
Constructing Series objects
The Pandas DataFrame Object
DataFrame as a generalized NumPy array
DataFrame as specialized dictionary
Constructing DataFrame objects
From a single Series object
From a list of dicts
From a dictionary of Series objects
From a two-dimensional NumPy array
From a NumPy structured array
The Pandas Index Object
Index as immutable array
Index as ordered set
Data Indexing and Selection
Data Selection in Series
Series as dictionary
Series as one-dimensional array
Indexers: loc, iloc, and ix
Data Selection in DataFrame
DataFrame as a dictionary
DataFrame as two-dimensional array
Additional indexing conventions
Operating on Data in Pandas
Ufuncs: Index Preservation
UFuncs: Index Alignment
Index alignment in Series
Index alignment in DataFrame
Ufuncs: Operations Between DataFrame and Series
Handling Missing Data
Trade-Offs in Missing Data Conventions
Missing Data in Pandas
None: Pythonic missing data
NaN: Missing numerical data
NaN and None in Pandas
Operating on Null Values
Detecting null values
Dropping null values
Filling null values
Hierarchical Indexing
A Multiply Indexed Series
The bad way
The better way: Pandas MultiIndex
MultiIndex as extra dimension
Methods of MultiIndex Creation
Explicit MultiIndex constructors
MultiIndex level names
MultiIndex for columns
Indexing and Slicing a MultiIndex
Multiply indexed Series
Multiply indexed DataFrames
Rearranging Multi-Indices
Sorted and unsorted indices
Stacking and unstacking indices
Index setting and resetting
Data Aggregations on Multi-Indices
Combining Datasets: Concat and Append
Recall: Concatenation of NumPy Arrays
Simple Concatenation with pd.concat
Duplicate indices
Catching the repeats as an error
Ignoring the index
Adding MultiIndex keys
Concatenation with joins
The append() method
Combining Datasets: Merge and Join
Relational Algebra
Categories of Joins
One-to-one joins
Many-to-one joins
Many-to-many joins
Specification of the Merge Key
The on keyword
The left_on and right_on keywords
The left_index and right_index keywords
Specifying Set Arithmetic for Joins
Overlapping Column Names: The suffixes Keyword
Example: US States Data
Aggregation and Grouping
Planets Data
Simple Aggregation in Pandas
GroupBy: Split, Apply, Combine
Split, apply, combine
The GroupBy object
Column indexing
Iteration over groups
Dispatch methods
Aggregate, filter, transform, apply
Aggregation
Filtering
Transformation
The apply() method
Specifying the split key
A list, array, series, or index providing the grouping keys
A dictionary or series mapping index to group
Any Python function
A list of valid keys
Grouping example
Pivot Tables
Motivating Pivot Tables
Pivot Tables by Hand
Pivot Table Syntax
Multilevel pivot tables
Additional pivot table options
Example: Birthrate Data
Further data exploration
Vectorized String Operations
Introducing Pandas String Operations
Tables of Pandas String Methods
Methods similar to Python string methods
Methods using regular expressions
Miscellaneous methods
Vectorized item access and slicing
Indicator variables
Example: Recipe Database
A simple recipe recommender
Going further with recipes
Working with Time Series
Dates and Times in Python
Native Python dates and times: datetime and dateutil
Typed arrays of times: NumPy’s datetime64
Dates and times in Pandas: Best of both worlds
Pandas Time Series: Indexing by Time
Pandas Time Series Data Structures
Regular sequences: pd.date_range()
Frequencies and Offsets
Resampling, Shifting, and Windowing
Resampling and converting frequencies
Time-shifts
Rolling windows
Where to Learn More
Example: Visualizing Seattle Bicycle Counts
Visualizing the data
Digging into the data
High-Performance Pandas: eval() and query()
Motivating query() and eval(): Compound Expressions
pandas.eval() for Efficient Operations
Operations supported by pd.eval()
Arithmetic operators
Comparison operators
Bitwise operators
Object attributes and indices
Other operations
DataFrame.eval() for Column-Wise Operations
Assignment in DataFrame.eval()
Local variables in DataFrame.eval()
DataFrame.query() Method
Performance: When to Use These Functions
Further Resources
4. Visualization with Matplotlib
General Matplotlib Tips
Importing matplotlib
Setting Styles
show() or No show()? How to Display Your Plots
Plotting from a script
Plotting from an IPython shell
Plotting from an IPython notebook
Saving Figures to File
Two Interfaces for the Price of One
MATLAB-style interface
Object-oriented interface
Simple Line Plots
Adjusting the Plot: Line Colors and Styles
Adjusting the Plot: Axes Limits
Labeling Plots
Simple Scatter Plots
Scatter Plots with plt.plot
Scatter Plots with plt.scatter
plot Versus scatter: A Note on Efficiency
Visualizing Errors
Basic Errorbars
Continuous Errors
Density and Contour Plots
Visualizing a Three-Dimensional Function
Histograms, Binnings, and Density
Two-Dimensional Histograms and Binnings
plt.hist2d: Two-dimensional histogram
plt.hexbin: Hexagonal binnings
Kernel density estimation
Customizing Plot Legends
Choosing Elements for the Legend
Legend for Size of Points
Multiple Legends
Customizing Colorbars
Customizing Colorbars
Choosing the colormap
Color limits and extensions
Discrete colorbars
Example: Handwritten Digits
Multiple Subplots
plt.axes: Subplots by Hand
plt.subplot: Simple Grids of Subplots
plt.subplots: The Whole Grid in One Go
plt.GridSpec: More Complicated Arrangements
Text and Annotation
Example: Effect of Holidays on US Births
Transforms and Text Position
Arrows and Annotation
Customizing Ticks
Major and Minor Ticks
Hiding Ticks or Labels
Reducing or Increasing the Number of Ticks
Fancy Tick Formats
Summary of Formatters and Locators
Customizing Matplotlib: Configurations and Stylesheets
Plot Customization by Hand
Changing the Defaults: rcParams
Stylesheets
Default style
FiveThirtyEight style
ggplot
Bayesian Methods for Hackers style
Dark background
Grayscale
Seaborn style
Three-Dimensional Plotting in Matplotlib
Three-Dimensional Points and Lines
Three-Dimensional Contour Plots
Wireframes and Surface Plots
Surface Triangulations
Example: Visualizing a Möbius strip
Geographic Data with Basemap
Map Projections
Cylindrical projections
Pseudo-cylindrical projections
Perspective projections
Conic projections
Other projections
Drawing a Map Background
Plotting Data on Maps
Example: California Cities
Example: Surface Temperature Data
Visualization with Seaborn
Seaborn Versus Matplotlib
Exploring Seaborn Plots
Histograms, KDE, and densities
Pair plots
Faceted histograms
Factor plots
Joint distributions
Bar plots
Example: Exploring Marathon Finishing Times
Further Resources
Matplotlib Resources
Other Python Graphics Libraries
5. Machine Learning
What Is Machine Learning?
Categories of Machine Learning
Qualitative Examples of Machine Learning Applications
Classification: Predicting discrete labels
Regression: Predicting continuous labels
Clustering: Inferring labels on unlabeled data
Dimensionality reduction: Inferring structure of unlabeled data
Summary
Introducing Scikit-Learn
Data Representation in Scikit-Learn
Data as table
Features matrix
Target array
Scikit-Learn’s Estimator API
Basics of the API
Supervised learning example: Simple linear regression
Supervised learning example: Iris classification
Unsupervised learning example: Iris dimensionality
Unsupervised learning: Iris clustering
Application: Exploring Handwritten Digits
Loading and visualizing the digits data
Unsupervised learning: Dimensionality reduction
Classification on digits
Summary
Hyperparameters and Model Validation
Thinking About Model Validation
Model validation the wrong way
Model validation the right way: Holdout sets
Model validation via cross-validation
Selecting the Best Model
The bias–variance trade-off
Validation curves in Scikit-Learn
Learning Curves
Learning curves in Scikit-Learn
Validation in Practice: Grid Search
Summary
Feature Engineering
Categorical Features
Text Features
Image Features
Derived Features
Imputation of Missing Data
Feature Pipelines
In Depth: Naive Bayes Classification
Bayesian Classification
Gaussian Naive Bayes
Multinomial Naive Bayes
Example: Classifying text
When to Use Naive Bayes
In Depth: Linear Regression
Simple Linear Regression
Basis Function Regression
Polynomial basis functions
Gaussian basis functions
Regularization
Ridge regression ( regularization)
Lasso regularization ()
Example: Predicting Bicycle Traffic
In-Depth: Support Vector Machines
Motivating Support Vector Machines
Support Vector Machines: Maximizing the Margin
Fitting a support vector machine
Beyond linear boundaries: Kernel SVM
Tuning the SVM: Softening margins
Example: Face Recognition
Support Vector Machine Summary
In-Depth: Decision Trees and Random Forests
Motivating Random Forests: Decision Trees
Creating a decision tree
Decision trees and overfitting
Ensembles of Estimators: Random Forests
Random Forest Regression
Example: Random Forest for Classifying Digits
Summary of Random Forests
In Depth: Principal Component Analysis
Introducing Principal Component Analysis
PCA as dimensionality reduction
PCA for visualization: Handwritten digits
What do the components mean?
Choosing the number of components
PCA as Noise Filtering
Example: Eigenfaces
Principal Component Analysis Summary
In-Depth: Manifold Learning
Manifold Learning: “HELLO”
Multidimensional Scaling (MDS)
MDS as Manifold Learning
Nonlinear Embeddings: Where MDS Fails
Nonlinear Manifolds: Locally Linear Embedding
Some Thoughts on Manifold Methods
Example: Isomap on Faces
Example: Visualizing Structure in Digits
In Depth: k-Means Clustering
Introducing k-Means
k-Means Algorithm: Expectation–Maximization
Caveats of expectation–maximization
Examples
Example 1: k-Means on digits
Example 2: k-means for color compression
In Depth: Gaussian Mixture Models
Motivating GMM: Weaknesses of k-Means
Generalizing E–M: Gaussian Mixture Models
Choosing the covariance type
GMM as Density Estimation
How many components?
Example: GMM for Generating New Data
In-Depth: Kernel Density Estimation
Motivating KDE: Histograms
Kernel Density Estimation in Practice
Selecting the bandwidth via cross-validation
Example: KDE on a Sphere
Example: Not-So-Naive Bayes
The anatomy of a custom estimator
Using our custom estimator
Application: A Face Detection Pipeline
HOG Features
HOG in Action: A Simple Face Detector
Caveats and Improvements
Further Machine Learning Resources
Machine Learning in Python
General Machine Learning
Index
← Prev
Back
Next →
← Prev
Back
Next →