Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
1. Introduction
Why machine learning?
Problems that machine learning can solve Knowing your data
Why Python?
What this book will cover What this book will not cover
Scikit-learn
Installing Scikit-learn Essential Libraries and Tools
Jupyter Notebook NumPy SciPy matplotlib Pandas
Python2 versus Python3 Versions Used in this Book
A First Application: Classifying iris species
Meet the data Measuring Success: Training and testing data First things first: Look at your data Building your first model: k nearest neighbors Making predictions Evaluating the model Summary
2. Supervised Learning
Classification and Regression Generalization, Overfitting and Underfitting Supervised Machine Learning Algorithms k-Nearest Neighbor
k-Neighbors Classification Analyzing KNeighborsClassifier k-Neighbors Regression Analyzing k nearest neighbors regression Strengths, weaknesses and parameters
Linear models
Linear models for regression Linear Regression aka Ordinary Least Squares Ridge regression Lasso Linear models for Classification Linear Models for multiclass classification Strengths, weaknesses and parameters
Naive Bayes Classifiers
Strengths, weaknesses and parameters
Decision trees
Building Decision Trees Controlling complexity of Decision Trees Analyzing Decision Trees Feature Importance in trees Strengths, weaknesses and parameters
Ensembles of Decision Trees
Random Forests
Building Random Forests Analyzing Random Forests Strengths, weaknesses and parameters
Gradient Boosted Regression Trees (Gradient Boosting Machines)
Strengths, weaknesses and parameters
Kernelized Support Vector Machines
Linear Models and Non-linear Features The Kernel Trick Understanding SVMs Tuning SVM parameters Preprocessing Data for SVMs Strengths, weaknesses and parameters
Neural Networks (Deep Learning)
The Neural Network Model Tuning Neural Networks Strengths, weaknesses and parameters
Estimating complexity in neural networks
Uncertainty estimates from classifiers
The Decision Function Predicting probabilities Uncertainty in multi-class classification
Summary and Outlook
3. Unsupervised Learning and Preprocessing
Types of unsupervised learning
Challenges in unsupervised learning
Preprocessing and Scaling
Different kinds of preprocessing Applying data transformations Scaling training and test data the same way The effect of preprocessing on supervised learning
Dimensionality Reduction, Feature Extraction and Manifold Learning
Principal Component Analysis (PCA)
Applying PCA to the cancer dataset for visualization Eigenfaces for feature extraction
Non-Negative Matrix Factorization (NMF)
Applying NMF to synthetic data Applying NMF to face images
Manifold learning with t-SNE
Clustering
k-Means clustering
Failure cases of k-Means Vector Quantization - Or Seeing k-Means as Decomposition
Agglomerative Clustering
Hierarchical Clustering and Dendrograms
DBSCAN
Comparing and evaluating clustering algorithms Evaluating clustering with ground truth Evaluating clustering without ground truth Comparing algorithms on the faces dataset Analyzing the faces dataset with DBSCAN Analyzing the faces dataset with k-Means Analyzing the faces dataset with agglomerative clustering
Summary of Clustering Methods
Summary and Outlook
4. Summary of scikit-learn methods and usage
The Estimator Interface Fit resets a model Method chaining Shortcuts and efficient alternatives Important Attributes Summary and outlook
5. Representing Data and Engineering Features
Categorical Variables
One-Hot-Encoding (Dummy variables)
Checking string-encoded categorical data
Binning, Discretization, Linear Models and Trees Interactions and Polynomials Univariate Non-linear transformations Automatic Feature Selection
Univariate statistics Model-based Feature Selection Iterative feature selection
Utilizing Expert Knowledge Summary and outlook
6. Model evaluation and improvement
Cross-validation
Cross-validation in scikit-learn
Benefits of cross-validation Stratified K-Fold cross-validation and other strategies More control over cross-validation
Leave-One-Out cross-validation Shuffle-Split cross-validation Cross-validation with groups
Grid Search
Simple Grid-Search The danger of overfitting the parameters and the validation set Grid-search with cross-validation Analyzing the result of cross-validation Using different cross-validation strategies with grid-search Nested cross-validation Parallelizing cross-validation and grid-search
Evaluation Metrics and scoring
Keep the end-goal in mind Metrics for binary classification
Kinds of Errors Imbalanced datasets Confusion matrices
Relation to accuracy
Precision, recall and f-score Taking uncertainty into account Precision-Recall curves and ROC curves Receiver Operating Characteristics (ROC) and AUC
Multi-class classification Regression metrics
Using evaluation metrics in model selection Summary and outlook
7. Algorithm Chains and Pipelines
Parameter Selection with Preprocessing
Building Pipelines Using Pipelines in Grid-searches The General Pipeline Interface Convenient Pipeline creation with make_pipeline
Accessing step attributes Accessing attributes in grid-searched pipeline.
Grid-searching preprocessing steps and model parameters
Summary and Outlook
8. Working with Text Data
Types of data represented as strings
Example application: Sentiment analysis of movie reviews Representing text data as Bag of Words
Applying bag-of-words to a toy dataset
Bag-of-word for movie reviews Stop-words
Rescaling the data with TFIDF
Investigating model coefficients Bag of words with more than one word (n-grams) Advanced tokenization, stemming and lemmatization
Topic Modeling and Document Clustering Summary and Outlook
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion