ϵ-insensitive loss regression, 280
accuracy, 34
activation function, 254
agglomerative clustering, 418
agglomerative coefficient, 421
alternating minimization, 362
archetypes, 361
area under the curve, 35
autocorrelation, 93
autoencoder, 377
automated machine learning, 301
AutoML, 301
average linkage, 419
backpropagation, 255
bagging, 191
base learner, 192
bias, 28
bias variance trade-off, 28
binary recursive partitioning, 177
bootstrap aggregating, 191
bootstrapping, 23
Box Cox transformation, 44
branches, 176
centroid linkage, 420
centroid update, 403
classification, 4
classification and regression tree, 176
cluster assignment step, 403
codings, 377
collinearity, 94
complete linkage, 419
components, 430
confusion matrix, 34
constant variance, 91
convex hull, 274
cosine distance, 400
cost complexity parameter, 181
cross-entropy, 34
curse of dimensionality, 408
decoder function, 378
deep autoencoders, 380
deep neural networks, 247
dendrogram, 417
denoising autoencoder, 390
deviance, 33
dice coefficient, 412
divisive coefficient, 423
divisive hierarchical clustering, 418
down-sampling, 19
dropout, 263
dummy encoding, 61
early stopping, 180
eigen decomposition, 348
eigenvalue, 348
eigenvalue criterion, 354
eigenvector, 348
elastic net, 126
elbow method, 408
encoder function, 378
ensemble, 191
epoch, 257
extreme gradient boosting, 237
feature effects, 307
feature selection, 123
features, 3
forward pass, 255
full Cartesian grid search, 211
Gaussian mixture model, 430
generalizability, 15
generalized low rank models, 359
generative modeling, 377
Gini index, 34
global interpretability, 307
Gower distance, 400
gradient boosting machines, 221
gradient descent, 224
greedy local optimum, 403
grid search, 31
hard margin classifier, 273
Hartigan-Wong algorithm, 401
hidden layer, 250
hidden layers, 252
hierarchical clustering, 417
hyperparameters, 29
imputation, 49
individual conditional expectation curves, 317
informative missingness, 46
input layer, 250
interpretable machine learning, 305
k-fold cross validation, 23
k-means clustering, 399
k-nearest neighbor, 157
kernel functions, 277
kernel trick, 271
Kullback-Leibler divergence, 385
label encoding, 62
Lasso penalty, 125
learners, 3
least squares, 80
leaves, 176
likelihood function, 108
linear regression, 79
loading vector, 348
local interpretability, 307
local interpretable model-agnostic explanations, 325
log transformation, 43
log-odds, 108
logistic regression, 105
logit transformation, 106
loss functions, 32
maximum likelihood, 82
mean absolute error, 33
mean per class error, 34
mean square error, 83
mean squared error, 32
memorization capacity, 259
mini-batch stochastic gradient descent, 256
misclassification, 33
missingness at random, 46
model agnostic, 309
model-agnostic, 102
model-based clustering, 429
monotonic linear relationship, 101
multicollinearity, 94
multivariate adaptive regression splines, 141
near-zero variance, 54
no free lunch, 13
one-hot encoding, 61
ordinal encoding, 64
orthogonal projection, 348
out-of-bag, 193
output layer, 250
overcomplete autoencoder, 389
partial dependence, 313
partial dependence plot, 314
partial least squares, 99
partitioning around medians, 413
penalized models, 123
permutation-based feature importance, 312
precision, 35
predictive model, 3
principal component regression, 96
principal components, 347
principal components analysis, 67, 345
probabilistic cluster assignment, 431
proportion of variance explained, 355
prune, 181
quadratic loss, 362
R squared, 33
random forest, 203
random grid search, 212
regression, 4
regularization, 121
resampling methods, 23
residual sum of squares, 80
residuals, 81
ridge penalty, 124
root mean squared error, 32
root mean squared logarithmic error, 33
root node, 176
scree plot, 356
sensitivity, 35
Shapley values, 331
shrinkage methods, 123
simple random sampling, 16
single linkage, 419
soft assignment, 429
soft margin classifier, 276
sparse autoencoders, 384
sparsity parameter, 385
specificity, 35
spectral clustering, 402
split-variable randomization, 204
splitting rules, 175
stacked autoencoders, 380
stacked regressions, 292
stacking, 291
standard error, 83
standardize, 57
stochastic gradient boosting, 233
stochastic gradient descent, 225
stratified sampling, 16
supervised learning, 4
target encoding, 65
terminal nodes, 176
tree correlation, 201
tree diagrams, 175
tree-based models, 175
undercomplete autoencoder, 378
unsupervised learning, 6
up-sampling, 19
validation, 23
variance, 28
Ward’s minimum variance, 420
weak model, 222
weight decay, 261
within-cluster variation, 401
XGBoost, 237
zero variance, 54