Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Preface
What Is Data Science? Who Is This Book For? Why Python?
Python 2 Versus Python 3
Outline of This Book Using Code Examples Installation Considerations Conventions Used in This Book O’Reilly Safari How to Contact Us
1. IPython: Beyond Normal Python
Shell or Notebook?
Launching the IPython Shell Launching the Jupyter Notebook
Help and Documentation in IPython
Accessing Documentation with ? Accessing Source Code with ?? Exploring Modules with Tab Completion
Tab completion of object contents Tab completion when importing Beyond tab completion: Wildcard matching
Keyboard Shortcuts in the IPython Shell
Navigation Shortcuts Text Entry Shortcuts Command History Shortcuts Miscellaneous Shortcuts
IPython Magic Commands
Pasting Code Blocks: %paste and %cpaste Running External Code: %run Timing Code Execution: %timeit Help on Magic Functions: ?, %magic, and %lsmagic
Input and Output History
IPython’s In and Out Objects Underscore Shortcuts and Previous Outputs Suppressing Output Related Magic Commands
IPython and Shell Commands
Quick Introduction to the Shell Shell Commands in IPython Passing Values to and from the Shell
Shell-Related Magic Commands Errors and Debugging
Controlling Exceptions: %xmode Debugging: When Reading Tracebacks Is Not Enough
Partial list of debugging commands
Profiling and Timing Code
Timing Code Snippets: %timeit and %time Profiling Full Scripts: %prun Line-by-Line Profiling with %lprun Profiling Memory Use: %memit and %mprun
More IPython Resources
Web Resources Books
2. Introduction to NumPy
Understanding Data Types in Python
A Python Integer Is More Than Just an Integer A Python List Is More Than Just a List Fixed-Type Arrays in Python Creating Arrays from Python Lists Creating Arrays from Scratch NumPy Standard Data Types
The Basics of NumPy Arrays
NumPy Array Attributes Array Indexing: Accessing Single Elements Array Slicing: Accessing Subarrays
One-dimensional subarrays Multidimensional subarrays
Accessing array rows and columns
Subarrays as no-copy views Creating copies of arrays
Reshaping of Arrays Array Concatenation and Splitting
Concatenation of arrays Splitting of arrays
Computation on NumPy Arrays: Universal Functions
The Slowness of Loops Introducing UFuncs Exploring NumPy’s UFuncs
Array arithmetic Absolute value Trigonometric functions Exponents and logarithms Specialized ufuncs
Advanced Ufunc Features
Specifying output Aggregates Outer products
Ufuncs: Learning More
Aggregations: Min, Max, and Everything in Between
Summing the Values in an Array Minimum and Maximum
Multidimensional aggregates Other aggregation functions
Example: What Is the Average Height of US Presidents?
Computation on Arrays: Broadcasting
Introducing Broadcasting Rules of Broadcasting
Broadcasting example 1 Broadcasting example 2 Broadcasting example 3
Broadcasting in Practice
Centering an array Plotting a two-dimensional function
Comparisons, Masks, and Boolean Logic
Example: Counting Rainy Days
Digging into the data
Comparison Operators as ufuncs Working with Boolean Arrays
Counting entries Boolean operators
Boolean Arrays as Masks
Fancy Indexing
Exploring Fancy Indexing Combined Indexing Example: Selecting Random Points Modifying Values with Fancy Indexing Example: Binning Data
Sorting Arrays
Fast Sorting in NumPy: np.sort and np.argsort
Sorting along rows or columns
Partial Sorts: Partitioning Example: k-Nearest Neighbors
Structured Data: NumPy’s Structured Arrays
Creating Structured Arrays More Advanced Compound Types RecordArrays: Structured Arrays with a Twist On to Pandas
3. Data Manipulation with Pandas
Installing and Using Pandas Introducing Pandas Objects
The Pandas Series Object
Series as generalized NumPy array Series as specialized dictionary Constructing Series objects
The Pandas DataFrame Object
DataFrame as a generalized NumPy array DataFrame as specialized dictionary Constructing DataFrame objects
From a single Series object From a list of dicts From a dictionary of Series objects From a two-dimensional NumPy array From a NumPy structured array
The Pandas Index Object
Index as immutable array Index as ordered set
Data Indexing and Selection
Data Selection in Series
Series as dictionary Series as one-dimensional array Indexers: loc, iloc, and ix
Data Selection in DataFrame
DataFrame as a dictionary DataFrame as two-dimensional array Additional indexing conventions
Operating on Data in Pandas
Ufuncs: Index Preservation UFuncs: Index Alignment
Index alignment in Series Index alignment in DataFrame
Ufuncs: Operations Between DataFrame and Series
Handling Missing Data
Trade-Offs in Missing Data Conventions Missing Data in Pandas
None: Pythonic missing data NaN: Missing numerical data NaN and None in Pandas
Operating on Null Values
Detecting null values Dropping null values Filling null values
Hierarchical Indexing
A Multiply Indexed Series
The bad way The better way: Pandas MultiIndex MultiIndex as extra dimension
Methods of MultiIndex Creation
Explicit MultiIndex constructors MultiIndex level names MultiIndex for columns
Indexing and Slicing a MultiIndex
Multiply indexed Series Multiply indexed DataFrames
Rearranging Multi-Indices
Sorted and unsorted indices Stacking and unstacking indices Index setting and resetting
Data Aggregations on Multi-Indices
Combining Datasets: Concat and Append
Recall: Concatenation of NumPy Arrays Simple Concatenation with pd.concat
Duplicate indices
Catching the repeats as an error Ignoring the index Adding MultiIndex keys
Concatenation with joins The append() method
Combining Datasets: Merge and Join
Relational Algebra Categories of Joins
One-to-one joins Many-to-one joins Many-to-many joins
Specification of the Merge Key
The on keyword The left_on and right_on keywords The left_index and right_index keywords
Specifying Set Arithmetic for Joins Overlapping Column Names: The suffixes Keyword Example: US States Data
Aggregation and Grouping
Planets Data Simple Aggregation in Pandas GroupBy: Split, Apply, Combine
Split, apply, combine The GroupBy object
Column indexing Iteration over groups Dispatch methods
Aggregate, filter, transform, apply
Aggregation Filtering Transformation The apply() method
Specifying the split key
A list, array, series, or index providing the grouping keys A dictionary or series mapping index to group Any Python function A list of valid keys
Grouping example
Pivot Tables
Motivating Pivot Tables Pivot Tables by Hand Pivot Table Syntax
Multilevel pivot tables Additional pivot table options
Example: Birthrate Data
Further data exploration
Vectorized String Operations
Introducing Pandas String Operations Tables of Pandas String Methods
Methods similar to Python string methods Methods using regular expressions Miscellaneous methods
Vectorized item access and slicing Indicator variables
Example: Recipe Database
A simple recipe recommender Going further with recipes
Working with Time Series
Dates and Times in Python
Native Python dates and times: datetime and dateutil Typed arrays of times: NumPy’s datetime64 Dates and times in Pandas: Best of both worlds
Pandas Time Series: Indexing by Time Pandas Time Series Data Structures
Regular sequences: pd.date_range()
Frequencies and Offsets Resampling, Shifting, and Windowing
Resampling and converting frequencies Time-shifts Rolling windows
Where to Learn More Example: Visualizing Seattle Bicycle Counts
Visualizing the data Digging into the data
High-Performance Pandas: eval() and query()
Motivating query() and eval(): Compound Expressions pandas.eval() for Efficient Operations
Operations supported by pd.eval()
Arithmetic operators Comparison operators Bitwise operators Object attributes and indices Other operations
DataFrame.eval() for Column-Wise Operations
Assignment in DataFrame.eval() Local variables in DataFrame.eval()
DataFrame.query() Method Performance: When to Use These Functions
Further Resources
4. Visualization with Matplotlib
General Matplotlib Tips
Importing matplotlib Setting Styles show() or No show()? How to Display Your Plots
Plotting from a script Plotting from an IPython shell Plotting from an IPython notebook
Saving Figures to File
Two Interfaces for the Price of One
MATLAB-style interface Object-oriented interface
Simple Line Plots
Adjusting the Plot: Line Colors and Styles Adjusting the Plot: Axes Limits Labeling Plots
Simple Scatter Plots
Scatter Plots with plt.plot Scatter Plots with plt.scatter plot Versus scatter: A Note on Efficiency
Visualizing Errors
Basic Errorbars Continuous Errors
Density and Contour Plots
Visualizing a Three-Dimensional Function
Histograms, Binnings, and Density
Two-Dimensional Histograms and Binnings
plt.hist2d: Two-dimensional histogram plt.hexbin: Hexagonal binnings Kernel density estimation
Customizing Plot Legends
Choosing Elements for the Legend Legend for Size of Points Multiple Legends
Customizing Colorbars
Customizing Colorbars
Choosing the colormap Color limits and extensions Discrete colorbars
Example: Handwritten Digits
Multiple Subplots
plt.axes: Subplots by Hand plt.subplot: Simple Grids of Subplots plt.subplots: The Whole Grid in One Go plt.GridSpec: More Complicated Arrangements
Text and Annotation
Example: Effect of Holidays on US Births Transforms and Text Position Arrows and Annotation
Customizing Ticks
Major and Minor Ticks Hiding Ticks or Labels Reducing or Increasing the Number of Ticks Fancy Tick Formats Summary of Formatters and Locators
Customizing Matplotlib: Configurations and Stylesheets
Plot Customization by Hand Changing the Defaults: rcParams Stylesheets
Default style FiveThirtyEight style ggplot Bayesian Methods for Hackers style Dark background Grayscale Seaborn style
Three-Dimensional Plotting in Matplotlib
Three-Dimensional Points and Lines Three-Dimensional Contour Plots Wireframes and Surface Plots Surface Triangulations
Example: Visualizing a Möbius strip
Geographic Data with Basemap
Map Projections
Cylindrical projections Pseudo-cylindrical projections Perspective projections Conic projections Other projections
Drawing a Map Background Plotting Data on Maps Example: California Cities Example: Surface Temperature Data
Visualization with Seaborn
Seaborn Versus Matplotlib Exploring Seaborn Plots
Histograms, KDE, and densities Pair plots Faceted histograms Factor plots Joint distributions Bar plots
Example: Exploring Marathon Finishing Times
Further Resources
Matplotlib Resources Other Python Graphics Libraries
5. Machine Learning
What Is Machine Learning?
Categories of Machine Learning Qualitative Examples of Machine Learning Applications
Classification: Predicting discrete labels Regression: Predicting continuous labels Clustering: Inferring labels on unlabeled data Dimensionality reduction: Inferring structure of unlabeled data
Summary
Introducing Scikit-Learn
Data Representation in Scikit-Learn
Data as table Features matrix Target array
Scikit-Learn’s Estimator API
Basics of the API Supervised learning example: Simple linear regression Supervised learning example: Iris classification Unsupervised learning example: Iris dimensionality Unsupervised learning: Iris clustering
Application: Exploring Handwritten Digits
Loading and visualizing the digits data Unsupervised learning: Dimensionality reduction Classification on digits
Summary
Hyperparameters and Model Validation
Thinking About Model Validation
Model validation the wrong way Model validation the right way: Holdout sets Model validation via cross-validation
Selecting the Best Model
The bias–variance trade-off Validation curves in Scikit-Learn
Learning Curves
Learning curves in Scikit-Learn
Validation in Practice: Grid Search Summary
Feature Engineering
Categorical Features Text Features Image Features Derived Features Imputation of Missing Data Feature Pipelines
In Depth: Naive Bayes Classification
Bayesian Classification Gaussian Naive Bayes Multinomial Naive Bayes
Example: Classifying text
When to Use Naive Bayes
In Depth: Linear Regression
Simple Linear Regression Basis Function Regression
Polynomial basis functions Gaussian basis functions
Regularization
Ridge regression ( regularization) Lasso regularization ()
Example: Predicting Bicycle Traffic
In-Depth: Support Vector Machines
Motivating Support Vector Machines Support Vector Machines: Maximizing the Margin
Fitting a support vector machine Beyond linear boundaries: Kernel SVM Tuning the SVM: Softening margins
Example: Face Recognition Support Vector Machine Summary
In-Depth: Decision Trees and Random Forests
Motivating Random Forests: Decision Trees
Creating a decision tree Decision trees and overfitting
Ensembles of Estimators: Random Forests Random Forest Regression Example: Random Forest for Classifying Digits Summary of Random Forests
In Depth: Principal Component Analysis
Introducing Principal Component Analysis
PCA as dimensionality reduction PCA for visualization: Handwritten digits What do the components mean? Choosing the number of components
PCA as Noise Filtering Example: Eigenfaces Principal Component Analysis Summary
In-Depth: Manifold Learning
Manifold Learning: “HELLO” Multidimensional Scaling (MDS) MDS as Manifold Learning Nonlinear Embeddings: Where MDS Fails Nonlinear Manifolds: Locally Linear Embedding Some Thoughts on Manifold Methods Example: Isomap on Faces Example: Visualizing Structure in Digits
In Depth: k-Means Clustering
Introducing k-Means k-Means Algorithm: Expectation–Maximization
Caveats of expectation–maximization
Examples
Example 1: k-Means on digits Example 2: k-means for color compression
In Depth: Gaussian Mixture Models
Motivating GMM: Weaknesses of k-Means Generalizing E–M: Gaussian Mixture Models
Choosing the covariance type
GMM as Density Estimation
How many components?
Example: GMM for Generating New Data
In-Depth: Kernel Density Estimation
Motivating KDE: Histograms Kernel Density Estimation in Practice
Selecting the bandwidth via cross-validation
Example: KDE on a Sphere Example: Not-So-Naive Bayes
The anatomy of a custom estimator Using our custom estimator
Application: A Face Detection Pipeline
HOG Features HOG in Action: A Simple Face Detector Caveats and Improvements
Further Machine Learning Resources
Machine Learning in Python General Machine Learning
Index
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion