Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Preface
Who Should Read This Book Why We Wrote This Book Navigating This Book Online Resources Conventions Used in This Book Using Code Examples O’Reilly Safari How to Contact Us Acknowledgments
From Andreas From Sarah
1. Introduction
Why Machine Learning?
Problems Machine Learning Can Solve Knowing Your Task and Knowing Your Data
Why Python? scikit-learn
Installing scikit-learn
Essential Libraries and Tools
Jupyter Notebook NumPy SciPy matplotlib pandas mglearn
Python 2 Versus Python 3 Versions Used in this Book A First Application: Classifying Iris Species
Meet the Data Measuring Success: Training and Testing Data First Things First: Look at Your Data Building Your First Model: k-Nearest Neighbors Making Predictions Evaluating the Model
Summary and Outlook
2. Supervised Learning
Classification and Regression Generalization, Overfitting, and Underfitting
Relation of Model Complexity to Dataset Size
Supervised Machine Learning Algorithms
Some Sample Datasets k-Nearest Neighbors
k-Neighbors classification Analyzing KNeighborsClassifier k-neighbors regression Analyzing KNeighborsRegressor Strengths, weaknesses, and parameters
Linear Models
Linear models for regression Linear regression (aka ordinary least squares) Ridge regression Lasso Linear models for classification Linear models for multiclass classification Strengths, weaknesses, and parameters
Naive Bayes Classifiers
Strengths, weaknesses, and parameters
Decision Trees
Building decision trees Controlling complexity of decision trees Analyzing decision trees Feature importance in trees Strengths, weaknesses, and parameters
Ensembles of Decision Trees
Random forests
Building random forests Analyzing random forests Strengths, weaknesses, and parameters
Gradient boosted regression trees (gradient boosting machines)
Strengths, weaknesses, and parameters
Kernelized Support Vector Machines
Linear models and nonlinear features The kernel trick Understanding SVMs Tuning SVM parameters Preprocessing data for SVMs Strengths, weaknesses, and parameters
Neural Networks (Deep Learning)
The neural network model Tuning neural networks Strengths, weaknesses, and parameters
Estimating complexity in neural networks
Uncertainty Estimates from Classifiers
The Decision Function Predicting Probabilities Uncertainty in Multiclass Classification
Summary and Outlook
3. Unsupervised Learning and Preprocessing
Types of Unsupervised Learning Challenges in Unsupervised Learning Preprocessing and Scaling
Different Kinds of Preprocessing Applying Data Transformations Scaling Training and Test Data the Same Way The Effect of Preprocessing on Supervised Learning
Dimensionality Reduction, Feature Extraction, and Manifold Learning
Principal Component Analysis (PCA)
Applying PCA to the cancer dataset for visualization Eigenfaces for feature extraction
Non-Negative Matrix Factorization (NMF)
Applying NMF to synthetic data Applying NMF to face images
Manifold Learning with t-SNE
Clustering
k-Means Clustering
Failure cases of k-means Vector quantization, or seeing k-means as decomposition
Agglomerative Clustering
Hierarchical clustering and dendrograms
DBSCAN Comparing and Evaluating Clustering Algorithms
Evaluating clustering with ground truth Evaluating clustering without ground truth Comparing algorithms on the faces dataset
Analyzing the faces dataset with DBSCAN Analyzing the faces dataset with k-means Analyzing the faces dataset with agglomerative clustering
Summary of Clustering Methods
Summary and Outlook
4. Representing Data and Engineering Features
Categorical Variables
One-Hot-Encoding (Dummy Variables)
Checking string-encoded categorical data
Numbers Can Encode Categoricals
Binning, Discretization, Linear Models, and Trees Interactions and Polynomials Univariate Nonlinear Transformations Automatic Feature Selection
Univariate Statistics Model-Based Feature Selection Iterative Feature Selection
Utilizing Expert Knowledge Summary and Outlook
5. Model Evaluation and Improvement
Cross-Validation
Cross-Validation in scikit-learn Benefits of Cross-Validation Stratified k-Fold Cross-Validation and Other Strategies
More control over cross-validation Leave-one-out cross-validation Shuffle-split cross-validation Cross-validation with groups
Grid Search
Simple Grid Search The Danger of Overfitting the Parameters and the Validation Set Grid Search with Cross-Validation
Analyzing the result of cross-validation Search over spaces that are not grids Using different cross-validation strategies with grid search Nested cross-validation Parallelizing cross-validation and grid search
Evaluation Metrics and Scoring
Keep the End Goal in Mind Metrics for Binary Classification
Kinds of errors Imbalanced datasets Confusion matrices
Relation to accuracy Precision, recall, and f-score
Taking uncertainty into account Precision-recall curves and ROC curves Receiver operating characteristics (ROC) and AUC
Metrics for Multiclass Classification Regression Metrics Using Evaluation Metrics in Model Selection
Summary and Outlook
6. Algorithm Chains and Pipelines
Parameter Selection with Preprocessing Building Pipelines Using Pipelines in Grid Searches The General Pipeline Interface
Convenient Pipeline Creation with make_pipeline Accessing Step Attributes Accessing Attributes in a Pipeline inside GridSearchCV
Grid-Searching Preprocessing Steps and Model Parameters Grid-Searching Which Model To Use Summary and Outlook
7. Working with Text Data
Types of Data Represented as Strings Example Application: Sentiment Analysis of Movie Reviews Representing Text Data as a Bag of Words
Applying Bag-of-Words to a Toy Dataset Bag-of-Words for Movie Reviews
Stopwords Rescaling the Data with tf–idf Investigating Model Coefficients Bag-of-Words with More Than One Word (n-Grams) Advanced Tokenization, Stemming, and Lemmatization Topic Modeling and Document Clustering
Latent Dirichlet Allocation
Summary and Outlook
8. Wrapping Up
Approaching a Machine Learning Problem
Humans in the Loop
From Prototype to Production Testing Production Systems Building Your Own Estimator Where to Go from Here
Theory Other Machine Learning Frameworks and Packages Ranking, Recommender Systems, and Other Kinds of Learning Probabilistic Modeling, Inference, and Probabilistic Programming Neural Networks Scaling to Larger Datasets Honing Your Skills
Conclusion
Index
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion