Hands-On Machine Learning with R by Boehmke, Bradley -- Read -- Imperial Library of Trantor

4.3 Multiple linear regression 4.4 Assessing model accuracy 4.5 Model concerns 4.6 Principal component regression 4.7 Partial least squares 4.8 Feature interpretation 4.9 Final thoughts

5 Logistic Regression

5.1 Prerequisites 5.2 Why logistic regression 5.3 Simple logistic regression 5.4 Multiple logistic regression 5.5 Assessing model accuracy 5.6 Model concerns 5.7 Feature interpretation 5.8 Final thoughts

6 Regularized Regression

6.1 Prerequisites 6.2 Why regularize?

6.2.1 Ridge penalty 6.2.2 Lasso penalty 6.2.3 Elastic nets

6.3 Implementation 6.4 Tuning 6.5 Feature interpretation 6.6 Attrition data 6.7 Final thoughts

7 Multivariate Adaptive Regression Splines

7.1 Prerequisites 7.2 The basic idea

7.2.1 Multivariate adaptive regression splines

7.3 Fitting a basic MARS model 7.4 Tuning 7.5 Feature interpretation 7.6 Attrition data 7.7 Final thoughts

8 K-Nearest Neighbors

8.1 Prerequisites 8.2 Measuring similarity

8.2.1 Distance measures 8.2.2 Preprocessing

8.3 Choosing k 8.4 MNIST example 8.5 Final thoughts

9 Decision Trees

9.1 Prerequisites 9.2 Structure 9.3 Partitioning 9.4 How deep?

9.4.1 Early stopping 9.4.2 Pruning

9.5 Ames housing example 9.6 Feature interpretation 9.7 Final thoughts

10 Bagging

10.1 Prerequisites 10.2 Why and when bagging works 10.3 Implementation 10.4 Easily parallelize 10.5 Feature interpretation 10.6 Final thoughts

11 Random Forests

11.1 Prerequisites 11.2 Extending bagging 11.3 Out-of-the-box performance 11.4 Hyperparameters

11.4.1 Number of trees 11.4.2 mtry 11.4.3 Tree complexity 11.4.4 Sampling scheme 11.4.5 Split rule

11.5 Tuning strategies 11.6 Feature interpretation 11.7 Final thoughts

12 Gradient Boosting

12.1 Prerequisites 12.2 How boosting works

12.2.1 A sequential ensemble approach 12.2.2 Gradient descent

12.3 Basic GBM

12.3.1 Hyperparameters 12.3.2 Implementation 12.3.3 General tuning strategy

12.4 Stochastic GBMs

12.4.1 Stochastic hyperparameters 12.4.2 Implementation

12.5 XGBoost

12.5.1 XGBoost hyperparameters 12.5.2 Tuning strategy

12.6 Feature interpretation 12.7 Final thoughts

13 Deep Learning

13.1 Prerequisites 13.2 Why deep learning 13.3 Feedforward DNNs 13.4 Network architecture

13.4.1 Layers and nodes 13.4.2 Activation

13.5 Backpropagation 13.6 Model training 13.7 Model tuning

13.7.1 Model capacity 13.7.2 Batch normalization 13.7.3 Regularization 13.7.4 Adjust learning rate

13.8 Grid search 13.9 Final thoughts

14 Support Vector Machines

14.1 Prerequisites 14.2 Optimal separating hyperplanes

14.2.1 The hard margin classifier 14.2.2 The soft margin classifier

14.3 The support vector machine

14.3.1 More than two classes 14.3.2 Support vector regression

14.4 Job attrition example

14.4.1 Class weights 14.4.2 Class probabilities

14.5 Feature interpretation 14.6 Final thoughts

15 Stacked Models

15.1 Prerequisites 15.2 The Idea

15.2.1 Common ensemble methods 15.2.2 Super learner algorithm 15.2.3 Available packages

15.3 Stacking existing models 15.4 Stacking a grid search 15.5 Automated machine learning

16 Interpretable Machine Learning

16.1 Prerequisites 16.2 The idea

16.2.1 Global interpretation 16.2.2 Local interpretation 16.2.3 Model-specific vs. model-agnostic

16.3 Permutation-based feature importance

16.3.1 Concept 16.3.2 Implementation

16.4 Partial dependence

16.4.1 Concept 16.4.2 Implementation 16.4.3 Alternative uses

16.5 Individual conditional expectation

16.5.1 Concept 16.5.2 Implementation

16.6 Feature interactions

16.6.1 Concept 16.6.2 Implementation 16.6.3 Alternatives

16.7 Local interpretable model-agnostic explanations

16.7.1 Concept 16.7.2 Implementation 16.7.3 Tuning 16.7.4 Alternative uses

16.8 Shapley values

16.8.1 Concept 16.8.2 Implementation 16.8.3 XGBoost and built-in Shapley values

16.9 Localized step-wise procedure

16.9.1 Concept 16.9.2 Implementation

16.10 Final thoughts

III Dimension Reduction

17 Principal Components Analysis

17.1 Prerequisites 17.2 The idea 17.3 Finding principal components 17.4 Performing PCA in R 17.5 Selecting the number of principal components

17.5.1 Eigenvalue criterion 17.5.2 Proportion of variance explained criterion 17.5.3 Scree plot criterion

17.6 Final thoughts

18 Generalized Low Rank Models

18.1 Prerequisites 18.2 The idea 18.3 Finding the lower ranks

18.3.1 Alternating minimization 18.3.2 Loss functions 18.3.3 Regularization 18.3.4 Selecting k

18.4 Fitting GLRMs in R

18.4.1 Basic GLRM model 18.4.2 Tuning to optimize for unseen data

18.5 Final thoughts

19 Autoencoders

19.1 Prerequisites 19.2 Undercomplete autoencoders

19.2.1 Comparing PCA to an autoencoder 19.2.2 Stacked autoencoders 19.2.3 Visualizing the reconstruction

19.3 Sparse autoencoders 19.4 Denoising autoencoders 19.5 Anomaly detection 19.6 Final thoughts

IV Clustering

20 K-means Clustering

20.1 Prerequisites 20.2 Distance measures 20.3 Defining clusters 20.4 k-means algorithm 20.5 Clustering digits 20.6 How many clusters? 20.7 Clustering with mixed data 20.8 Alternative partitioning methods 20.9 Final thoughts

21 Hierarchical Clustering

21.1 Prerequisites 21.2 Hierarchical clustering algorithms 21.3 Hierarchical clustering in R

21.3.1 Agglomerative hierarchical clustering 21.3.2 Divisive hierarchical clustering

21.4 Determining optimal clusters 21.5 Working with dendrograms 21.6 Final thoughts

22 Model-based Clustering

22.1 Prerequisites 22.2 Measuring probability and uncertainty 22.3 Covariance types 22.4 Model selection 22.5 My basket example 22.6 Final thoughts

Bibliography Index

← Prev
Back
Next →

← Prev
Back
Next →