Index

ϵ-insensitive loss regression, 280

accuracy, 34

activation function, 254

agglomerative clustering, 418

agglomerative coefficient, 421

alternating minimization, 362

anomaly detection, 377, 391

archetypes, 361

area under the curve, 35

autocorrelation, 93

autoencoder, 377

automated machine learning, 301

AutoML, 301

average linkage, 419

backpropagation, 255

bagging, 191

base learner, 192

base learners, 222, 291

bias, 28

bias variance trade-off, 28

binary recursive partitioning, 177

bootstrap aggregating, 191

bootstrapping, 23

Box Cox transformation, 44

branches, 176

centroid linkage, 420

centroid update, 403

classification, 4

classification and regression tree, 176

cluster assignment step, 403

clustering, 6, 399

codings, 377

collinearity, 94

complete linkage, 419

components, 430

confusion matrix, 34

constant variance, 91

convex hull, 274

cosine distance, 400

cost complexity parameter, 181

cross-entropy, 34

curse of dimensionality, 408

data leakage, 13, 68

decoder function, 378

deep autoencoders, 380

deep neural networks, 247

dendrogram, 417

denoising autoencoder, 390

deviance, 33

dice coefficient, 412

dimension reduction, 6, 350

divisive coefficient, 423

divisive hierarchical clustering, 418

down-sampling, 19

dropout, 263

dummy encoding, 61

early stopping, 180

eigen decomposition, 348

eigenvalue, 348

eigenvalue criterion, 354

eigenvector, 348

elastic net, 126

elbow method, 408

encoder function, 378

ensemble, 191

epoch, 257

euclidean distance, 159, 400

extreme gradient boosting, 237

feature effects, 307

feature importance, 307, 311

feature selection, 123

features, 3

forward pass, 255

full Cartesian grid search, 211

Gaussian mixture model, 430

generalizability, 15

generalized low rank models, 359

generative modeling, 377

Gini index, 34

global interpretability, 307

Gower distance, 400

gradient boosting machines, 221

gradient descent, 224

greedy local optimum, 403

grid search, 31

hard margin classifier, 273

Hartigan-Wong algorithm, 401

hidden layer, 250

hidden layers, 252

hierarchical clustering, 417

hyperparameters, 29

hyperplane, 122, 271

imputation, 49

individual conditional expectation curves, 317

informative missingness, 46

input layer, 250

interpretable machine learning, 305

k-fold cross validation, 23

k-means clustering, 399

k-nearest neighbor, 157

kernel functions, 277

kernel trick, 271

Kullback-Leibler divergence, 385

label encoding, 62

Lasso penalty, 125

learners, 3

learning rate, 225, 256

least squares, 80

leaves, 176

likelihood function, 108

linear regression, 79

loading vector, 348

local interpretability, 307

local interpretable model-agnostic explanations, 325

log transformation, 43

log-odds, 108

logistic regression, 105

logit transformation, 106

loss functions, 32

manhattan distance, 159, 400

maximum likelihood, 82

mean absolute error, 33

mean per class error, 34

mean square error, 83

mean squared error, 32

memorization capacity, 259

mini-batch stochastic gradient descent, 256

misclassification, 33

missingness at random, 46

model agnostic, 309

model-agnostic, 102

model-based clustering, 429

monotonic linear relationship, 101

multicollinearity, 94

multivariate adaptive regression splines, 141

near-zero variance, 54

no free lunch, 13

one-hot encoding, 61

ordinal encoding, 64

orthogonal projection, 348

out-of-bag, 193

output layer, 250

overcomplete autoencoder, 389

partial dependence, 313

partial dependence plot, 314

partial least squares, 99

partitioning around medians, 413

penalized models, 123

permutation-based feature importance, 312

precision, 35

predictive model, 3

principal component regression, 96

principal components, 347

principal components analysis, 67, 345

probabilistic cluster assignment, 431

proportion of variance explained, 355

prune, 181

quadratic loss, 362

R squared, 33

random forest, 203

random grid search, 212

regression, 4

regularization, 121

resampling methods, 23

residual sum of squares, 80

residuals, 81

ridge penalty, 124

root mean squared error, 32

root mean squared logarithmic error, 33

root node, 176

scree plot, 356

sensitivity, 35

Shapley values, 331

shrinkage methods, 123

simple random sampling, 16

single linkage, 419

soft assignment, 429

soft margin classifier, 276

sparse autoencoders, 384

sparsity parameter, 385

specificity, 35

spectral clustering, 402

split-variable randomization, 204

splitting rules, 175

stacked autoencoders, 380

stacked regressions, 292

stacking, 291

standard error, 83

standardize, 57

stochastic gradient boosting, 233

stochastic gradient descent, 225

stratified sampling, 16

super learner, 8, 291

supervised learning, 4

target encoding, 65

terminal nodes, 176

tree correlation, 201

tree diagrams, 175

tree-based models, 175

undercomplete autoencoder, 378

unsupervised learning, 6

up-sampling, 19

validation, 23

variance, 28

Ward’s minimum variance, 420

weak model, 222

weight decay, 261

within-cluster variation, 401

XGBoost, 237

zero variance, 54