Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Cover
Half Title
Series Page
Title Page
Copyright Page
Dedication
Table of Contents
Preface
I Fundamentals
1 Introduction to Machine Learning
1.1 Supervised learning
1.1.1 Regression problems
1.1.2 Classification problems
1.2 Unsupervised learning
1.3 Roadmap
1.4 The data sets
2 Modeling Process
2.1 Prerequisites
2.2 Data splitting
2.2.1 Simple random sampling
2.2.2 Stratified sampling
2.2.3 Class imbalances
2.3 Creating models in R
2.3.1 Many formula interfaces
2.3.2 Many engines
2.4 Resampling methods
2.4.1 k-fold cross validation
2.4.2 Bootstrapping
2.4.3 Alternatives
2.5 Bias variance trade-off
2.5.1 Bias
2.5.2 Variance
2.5.3 Hyperparameter tuning
2.6 Model evaluation
2.6.1 Regression models
2.6.2 Classification models
2.7 Putting the processes together
3 Feature & Target Engineering
3.1 Prerequisites
3.2 Target engineering
3.3 Dealing with missingness
3.3.1 Visualizing missing values
3.3.2 Imputation
3.4 Feature filtering
3.5 Numeric feature engineering
3.5.1 Skewness
3.5.2 Standardization
3.6 Categorical feature engineering
3.6.1 Lumping
3.6.2 One-hot & dummy encoding
3.6.3 Label encoding
3.6.4 Alternatives
3.7 Dimension reduction
3.8 Proper implementation
3.8.1 Sequential steps
3.8.2 Data leakage
3.8.3 Putting the process together
II Supervised Learning
4 Linear Regression
4.1 Prerequisites
4.2 Simple linear regression
4.2.1 Estimation
4.2.2 Inference
4.3 Multiple linear regression
4.4 Assessing model accuracy
4.5 Model concerns
4.6 Principal component regression
4.7 Partial least squares
4.8 Feature interpretation
4.9 Final thoughts
5 Logistic Regression
5.1 Prerequisites
5.2 Why logistic regression
5.3 Simple logistic regression
5.4 Multiple logistic regression
5.5 Assessing model accuracy
5.6 Model concerns
5.7 Feature interpretation
5.8 Final thoughts
6 Regularized Regression
6.1 Prerequisites
6.2 Why regularize?
6.2.1 Ridge penalty
6.2.2 Lasso penalty
6.2.3 Elastic nets
6.3 Implementation
6.4 Tuning
6.5 Feature interpretation
6.6 Attrition data
6.7 Final thoughts
7 Multivariate Adaptive Regression Splines
7.1 Prerequisites
7.2 The basic idea
7.2.1 Multivariate adaptive regression splines
7.3 Fitting a basic MARS model
7.4 Tuning
7.5 Feature interpretation
7.6 Attrition data
7.7 Final thoughts
8 K-Nearest Neighbors
8.1 Prerequisites
8.2 Measuring similarity
8.2.1 Distance measures
8.2.2 Preprocessing
8.3 Choosing k
8.4 MNIST example
8.5 Final thoughts
9 Decision Trees
9.1 Prerequisites
9.2 Structure
9.3 Partitioning
9.4 How deep?
9.4.1 Early stopping
9.4.2 Pruning
9.5 Ames housing example
9.6 Feature interpretation
9.7 Final thoughts
10 Bagging
10.1 Prerequisites
10.2 Why and when bagging works
10.3 Implementation
10.4 Easily parallelize
10.5 Feature interpretation
10.6 Final thoughts
11 Random Forests
11.1 Prerequisites
11.2 Extending bagging
11.3 Out-of-the-box performance
11.4 Hyperparameters
11.4.1 Number of trees
11.4.2 mtry
11.4.3 Tree complexity
11.4.4 Sampling scheme
11.4.5 Split rule
11.5 Tuning strategies
11.6 Feature interpretation
11.7 Final thoughts
12 Gradient Boosting
12.1 Prerequisites
12.2 How boosting works
12.2.1 A sequential ensemble approach
12.2.2 Gradient descent
12.3 Basic GBM
12.3.1 Hyperparameters
12.3.2 Implementation
12.3.3 General tuning strategy
12.4 Stochastic GBMs
12.4.1 Stochastic hyperparameters
12.4.2 Implementation
12.5 XGBoost
12.5.1 XGBoost hyperparameters
12.5.2 Tuning strategy
12.6 Feature interpretation
12.7 Final thoughts
13 Deep Learning
13.1 Prerequisites
13.2 Why deep learning
13.3 Feedforward DNNs
13.4 Network architecture
13.4.1 Layers and nodes
13.4.2 Activation
13.5 Backpropagation
13.6 Model training
13.7 Model tuning
13.7.1 Model capacity
13.7.2 Batch normalization
13.7.3 Regularization
13.7.4 Adjust learning rate
13.8 Grid search
13.9 Final thoughts
14 Support Vector Machines
14.1 Prerequisites
14.2 Optimal separating hyperplanes
14.2.1 The hard margin classifier
14.2.2 The soft margin classifier
14.3 The support vector machine
14.3.1 More than two classes
14.3.2 Support vector regression
14.4 Job attrition example
14.4.1 Class weights
14.4.2 Class probabilities
14.5 Feature interpretation
14.6 Final thoughts
15 Stacked Models
15.1 Prerequisites
15.2 The Idea
15.2.1 Common ensemble methods
15.2.2 Super learner algorithm
15.2.3 Available packages
15.3 Stacking existing models
15.4 Stacking a grid search
15.5 Automated machine learning
16 Interpretable Machine Learning
16.1 Prerequisites
16.2 The idea
16.2.1 Global interpretation
16.2.2 Local interpretation
16.2.3 Model-specific vs. model-agnostic
16.3 Permutation-based feature importance
16.3.1 Concept
16.3.2 Implementation
16.4 Partial dependence
16.4.1 Concept
16.4.2 Implementation
16.4.3 Alternative uses
16.5 Individual conditional expectation
16.5.1 Concept
16.5.2 Implementation
16.6 Feature interactions
16.6.1 Concept
16.6.2 Implementation
16.6.3 Alternatives
16.7 Local interpretable model-agnostic explanations
16.7.1 Concept
16.7.2 Implementation
16.7.3 Tuning
16.7.4 Alternative uses
16.8 Shapley values
16.8.1 Concept
16.8.2 Implementation
16.8.3 XGBoost and built-in Shapley values
16.9 Localized step-wise procedure
16.9.1 Concept
16.9.2 Implementation
16.10 Final thoughts
III Dimension Reduction
17 Principal Components Analysis
17.1 Prerequisites
17.2 The idea
17.3 Finding principal components
17.4 Performing PCA in R
17.5 Selecting the number of principal components
17.5.1 Eigenvalue criterion
17.5.2 Proportion of variance explained criterion
17.5.3 Scree plot criterion
17.6 Final thoughts
18 Generalized Low Rank Models
18.1 Prerequisites
18.2 The idea
18.3 Finding the lower ranks
18.3.1 Alternating minimization
18.3.2 Loss functions
18.3.3 Regularization
18.3.4 Selecting k
18.4 Fitting GLRMs in R
18.4.1 Basic GLRM model
18.4.2 Tuning to optimize for unseen data
18.5 Final thoughts
19 Autoencoders
19.1 Prerequisites
19.2 Undercomplete autoencoders
19.2.1 Comparing PCA to an autoencoder
19.2.2 Stacked autoencoders
19.2.3 Visualizing the reconstruction
19.3 Sparse autoencoders
19.4 Denoising autoencoders
19.5 Anomaly detection
19.6 Final thoughts
IV Clustering
20 K-means Clustering
20.1 Prerequisites
20.2 Distance measures
20.3 Defining clusters
20.4 k-means algorithm
20.5 Clustering digits
20.6 How many clusters?
20.7 Clustering with mixed data
20.8 Alternative partitioning methods
20.9 Final thoughts
21 Hierarchical Clustering
21.1 Prerequisites
21.2 Hierarchical clustering algorithms
21.3 Hierarchical clustering in R
21.3.1 Agglomerative hierarchical clustering
21.3.2 Divisive hierarchical clustering
21.4 Determining optimal clusters
21.5 Working with dendrograms
21.6 Final thoughts
22 Model-based Clustering
22.1 Prerequisites
22.2 Measuring probability and uncertainty
22.3 Covariance types
22.4 Model selection
22.5 My basket example
22.6 Final thoughts
Bibliography
Index
← Prev
Back
Next →
← Prev
Back
Next →