, xvi
χ2 pruning, 159
χ2 statistic, 452
ε0 bootstrap, 411
absence-presence, 215
absolute rarity, 502
ABT, see analytics base table
AdaBoost, 169
aggregate features, 35
analytics base table, 14, 21, 27, 27, 47, 52, 53, 55, 100, 104, 467
Anscombe’s quartet, 90
anti-discrimination legislation, 41
area under the curve, 429, 460
arithmetic mean, 416, 418, 446, 461, 525, 525
artificial intelligence, 315
artificial neural networks, 387
astronomy, 483
AUC, see area under the curve
average class accuracy, 416, 417, 421, 446, 455, 461, 477
backward sequential selection, 233
bagging, 163, 165, 169, 515, 518
balanced sample, 472
basis functions, 366, 381, 385
batch gradient descent, 341
Bayes’ Theorem, 247, 249, 252, 513
Bayes, Thomas, 252
Bayesian information criterion, 302
Bayesian MAP prediction model, 260
Bayesian network, 247, 272, 293, 315, 514, 515, 518
Bayesian optimal classifier, 259
Bayesian score, 303
bi-modal distribution, 63
BIC, see Bayesian information criterion
binary data, 34
binary logarithm, 125
binary tree, 196
binomial distribution, 177
bins, 94
bits, 126
black box model, 522
Blondlot, René, 397
boosting, 163, 164, 169, 515, 518
bootstrap aggregating, see bagging
bootstrap samples, 165
bootstrapping, 411
branches, 121
brute-force search, 331
burn-in time, 311
business problem, 21
Business Understanding, 13, 17, 22, 27, 48, 49, 463, 483, 512
C4.5, 167
calculus, 551
capacity for model retraining, 521
Cardano, Gerolamo, 247
CART, 167
case-based reasoning, 238
categorical data, 34
causal graphs, 303
CBR, see case-based reasoning
central tendency, 56, 416, 525, 526
chain rule (differentiation), 324, 338, 360, 551, 554
chain rule (probability), 249, 256, 263, 300, 541, 549, 554
Chebyshev distance, 185
chessboard distance, 185
child node, 295
citizen science, 489
classification accuracy, 404, 409, 417
Cohen’s kappa, 509
collection limitation principle, 42
comparative experiments, 453
complete case analysis, 73, 494
component, 282
composite function, 554
concept drift, 236, 447, 448, 509
condensed nearest neighbor, 238
conditional independence, 261, 262, 267, 313
conditional maximum entropy model, 373
conditional probability, 249, 251, 256, 541, 544, 544, 548
conditional probability table, 294
conditionally independent, 293, 297
confidence factor, 164
confounding feature, 92
confusion matrix, 402, 420, 440, 461
consistent model, 4, 5, 121, 143
constrained quadratic optimization problem, 379
constraints, 379
continuous data, 34
continuous function, 552
control group, 453
convergence, 336
convex surface, 331
coordinate system, 181
correlation, 86, 88, 100, 110, 226
correlation matrix, 89
Corruption Perception Index, 242, 304
cosine, 218
cosine similarity, 218, 226, 235, 241
CPI, see Corruption Perception Index
CPT, see conditional probability table
CRISP-DM, see Cross Industry Standard Process for Data Mining
critical value pruning, 159
CRM, see customer relationship management
Cross Industry Standard Process for Data Mining, 12, 20, 27, 48, 55, 100, 398, 511
cross-sell model, 438
crowdsourcing, 489
cubic function, 552
cumulative gain, 433, 435, 479, 480
cumulative gain chart, 436, 438, 461
cumulative lift, 433, 437, 479, 480
cumulative lift chart, 438
cumulative lift curve, 438
curse of dimensionality, 228, 236, 260, 268, 523, 548
customer churn, 50
customer relationship management, 438
d-separation, 300
data, 1
data analytics, 1
data availability, 33
data management tools, 43
data manipulation, 43
data manipulation tools, 43
data mining, 12
Data Preparation, 14, 17, 27, 48, 49, 55, 92, 100, 101, 400, 471, 495, 512
data protection legislation, 41
data quality issues, 55, 66, 100
data quality report, 55, 56, 100, 105, 112, 472, 492
data subject, 41
Data Understanding, 14, 17, 27, 48, 49, 55, 100, 467, 488, 512
data-driven decisions, 17
database management systems, 43
dataset, 3 de Fermat, Pierre, 247
decision boundary, 188, 353, 376, 518
decision surface, 356
decision tree, 117, 121, 121, 167, 406, 421, 423, 514, 515, 518, 519
decisions, 1
deep learning, 524
degrees of freedom, 279
delta value, 336
density curve, 64
density histogram, 537
Deployment, 15, 18, 482, 509, 512
derivative, 551
descriptive features, 3, 17, 21, 27, 467
diagnosis, 2
discontinuous function, 356
discriminative model, 515
disease diagnosis, 403
distance weighted k nearest neighbor, 193
distributions, 109
document classification, 2, 226
domain concept, 21, 29, 48, 467, 488
domain concepts, 468
domain representation, 514
domain subconcept, 32
dosage prediction, 1
eager learner, 236
early stopping criteria, 155, 159
ecological modeling, 137
edges, 294
EEG, see electroencephalography pattern recognition
electroencephalography pattern recognition, 369
email classification, 401
ensembles, see model ensemble entropy, 45, 117, 120, 125, 154, 170, 172, 513
equal-frequency binning, 94, 97, 109, 289, 304, 320
equal-width binning, 94, 96, 109, 289
equation of a line, 325
ergodic, 310
error function, 323, 327, 327, 383
error rate, 160
error-based learning, 17, 323 ethics, 49
ETL, see extract-transform-load
Euclidean coordinate space, 183
Euclidean distance, 183, 200, 235, 241, 444, 513
Euclidean norm, 380
Evaluation, 15, 18, 398, 479, 508, 512
event, 250, 541, 542 experiment, 250, 541, 542
experimental design, 456
exponential distribution, 63, 76, 277, 281
extract-transform-load, 43, 482
F measure, see F1 measure
F score, see F1 measure
F1 measure, 414, 416, 416, 418
F1 score, see F1 measure
factors, 264
false alarms, 402
false negative rate, 413
false positive rate, 413
fat tails, 279
feature selection, 101, 179, 230, 230, 236, 503, 523
feature subset space, 231
features, 48
filters, 230
flag features, 36
FN, see false negative
FNR, see false negative rate
folds, 408
forward reasoning, 252
forward sequential selection, 233
FP, see false positive
FPR, see false positive rate
frequency counts, 530
frequency histogram, 536
frequency table, 531
full joint probability distribution, 292, 313, 547
galaxy morphology, 484
Galaxy Zoo, 489
gamma function, 278
Gapminder, 242
Gauss, Carl Friedrich, 330
Gauss-Jordan elimination, 222
Gaussian distribution, 64
Gaussian radial basis kernel, 382
Generalized Bayes’ Theorem, 256
generative model, 515
Gibbs sampling, 309
Gini coefficient, 304, 430, 455, 460
Gini index, 148, 167, 172, 430
global minimum, 331
Goldilocks model, 12
gradient descent, 279, 282, 283, 331, 332, 334, 384, 406
greedy local search problem, 231
group think, 163
hamming distance, 245
Hand, David, 456
harmonic mean, 416, 418, 418, 421, 446, 455, 477
heating load prediction, 388
heterogeneity, 126
hidden features, 547
histogram, 56, 525, 534, 535 hits, 402
hold-out test set, 397, 400, 405, 448, 500
hypersphere, 220
ID3, see Iterative Dichotomizer 3
identity matrix, 222
ill-posed problem, 6, 9, 17, 19, 20, 148
imbalanced data, 192, 417, 472
independent features, 88
inductive bias, 10, 10, 17, 20, 123, 144, 341, 372, 377, 511, 518
information, 121
information gain, 117, 121, 129, 131, 134, 136, 170, 172, 231, 499
information gain ratio, 144, 172
information-based learning, 17, 117
insights, 1
integration, 284
inter-annotator agreement, 508, 509
inter-quartile range, 75, 530, 538
interacting features, 230
interaction effect, 168
interaction term, 371
interior nodes, 121
interpolate, 529
interpretability of models, 522
interval data, 34
interval size, 284
invariant distribution, 310
inverse covariance matrix, 222
inverse reasoning, 252
IQR, see inter-quartile range
irregular cardinality, 66, 68, 100
irrelevant features, 230
Iterative Dichotomizer 3, 10, 117, 134, 134, 167, 171, 175, 406, 513
J48, 167
jackknifing, 411
joint probability, 251, 256, 544
joint probability distribution, 251, 546
k nearest neighbor, 179, 191, 235, 400, 421, 443, 500, 513, 515, 518
k-d tree, 179, 196, 214, 236, 246
k-fold cross validation, 408
k-NN, see k nearest neighbor
K-S chart, see Kolmogorov-Smirnov chart
K-S statistic, see Kolmogorov-Smirnov statistic
K2 score, 303
knowledge elicitation, 30
Kolmogorov-Smirnov chart, 431
Kolmogorov-Smirnov statistic, 431, 452
Kolmogorov-Smirnov test, 280
labeled dataset, 7
Lagrange multipliers, 378
lazy learner, 236
leaf nodes, 121
learning rate decay, 349
least squares optimization, 331
leave-one-out cross validation, 411
left skew, 63
levels, 34
lift chart, 438
light tails, 279
linear function, 552
linear kernel, 382
linear relationship, 325, 365, 385
linear separator, 353
linearly separable, 353, 370, 381
local models, 188
locality sensitive hashing, 238
location parameter, 278
location-scale family of distributions, 278
logarithm, 124
logistic function, 357
logistic regression, 323, 353, 357, 384, 423, 500, 514, 515, 518, 519
LogitBoost, 169
long tails, 63
longevity, 33
loss functions, 327
loss given default, 421
LU decomposition, 222
machine learning algorithm, 5, 17
MAE, see mean absolute error
Mahalanobis distance, 221, 226, 235
Manhattan distance, 183, 183, 235, 241, 444
MAP, see maximum a posteriori
margin, 377
margin of error, 532
marginalization, 547
Markov blanket, 297
Markov chain Monte Carlo, 309, 515
MaxEnt model, 373
maximum a posteriori, 259, 267, 423
maximum entropy model, 373
maximum likelihood, 313
MCMC, see Markov chain Monte Carlo
mean imputation, 392
mean squared error, 443
measures of similarity, 179
median, 56, 74, 416, 526, 526, 527, 530, 538
minimum description length principle, 302
misclassification rate, 397, 401, 403, 404, 416, 417
misses, 402
missing indicator feature, 73
missing values, 66, 67, 100, 235, 472
mixing time, 311
mixture of Gaussians distribution, 277, 281
mode imputation, 392
model ensemble, 163, 168, 177, 515
model parameters, 327
Modeling, 14, 17, 92, 477, 500, 512
Monte Carlo methods, 309
MSE, see mean squared error multi-label classification, 524
multimodal distribution, 63, 282
multinomial logistic regression, 373, 394
multinomial model, 323, 385, 440
multivariable linear regression, 332, 443
multivariable linear regression with gradient descent, 10, 323, 513
N rays, 397
naive Bayes model, 247, 267, 292, 320, 321, 423, 513, 514, 518, 519
natural language processing, 238
natural logarithm, 449
nearest neighbor, 315, 514, 519
nearest neighbor algorithm, 179, 186, 235
negative level, 402
negatively covariant, 79
neural networks, 387
next-best-offer model, 39
No Free Lunch Theorem, 11, 518
nodes, 294
noise dampening mechanism, 163
non-linear model, 323
non-linear relationship, 385
non-negativity criterion, 183, 213
non-parametric model, 514
normal distribution, 62, 64, 75, 83, 277, 424
normalization, 93, 179, 206, 235, 343, 346, 361
normalization constant, 256
null hypothesis, 347
numeric data, 34
on-going model validation, 447, 482
one-class classification, 239
one-row-per-subject, 29
one-versus-all model, 373, 373, 383, 385
ordinal data, 34
other features, 36
out-of-time sampling, 412, 456
outcome, 541
outlier detection, 239
outliers, 66, 69, 93, 98, 100, 473, 526, 538
over-sampling, 99
overfitting, 11, 158, 163, 192, 261, 272, 406
overlap metric, 245
p-value, 348
paradox of the false positive, 259
parameterized model, 323, 324, 327
parametric model, 514
parent node, 295
Pareto charts, 534
partial derivative, 324, 331, 551, 555
Pascal, Blaise, 247
PDF, see probability density function
Pearson correlation, 226
Pearson product-moment correlation coefficient, 88
Pearson, Karl, 88
peeking, 400
perceptron learning rule, 357
personal data, 41
placebo, 453
polynomial functions, 552
polynomial kernel, 382
polynomial relationship, 367
population, 532
population mean, 64
population parameters, 533
population standard deviation, 64
positive level, 402
positively covariant, 79
posterior probability, 544
posterior probability distribution, 255
prediction, 2
prediction speed, 521
prediction subject, 21, 28, 467, 488
predictive data analytics, 1, 1, 19
predictive features, 230
preference bias, 10
presence-absence, 215
price prediction, 1
probability density function, 64, 250, 277, 543
probability distribution, 61, 100, 251, 534, 546
probability function, 250, 543
probability mass function, 250, 543
probability-based learning, 17, 247
product rule, 249, 254, 541, 549
profit matrix, 420
propensity modeling, 2, 37, 468
proportions, 530
pruning dataset, 160
purpose specification principle, 42
r-trees, 238
random sampling, 98
random sampling with replacement, 99
random sampling without replacement, 100
random variable, 250, 541, 542
range normalization, 93, 93, 108, 207, 335, 355, 358, 392, 393
rank and prune, 230
rate parameter, 281
ratio features, 36
receiver operating characteristic curve, 425, 459
receiver operating characteristic index, 425, 429, 456
receiver operating characteristic space, 427
reduced error pruning, 160, 173, 478
redundant features, 230
regression task, 153
regression tree, 153
reinforcement learning, 3
relative frequency, 249, 541, 543
relative rarity, 502
replicated training set, 164
residual, 328
right skew, 62
risk assessment, 1
RMSE, see root mean squared error
ROC curve, see receiver operating characteristic curve
ROC index, see receiver operating characteristic index
ROC space, see receiver operating characteristic space
root mean squared error, 444, 446
root node, 121
sabremetrics, 181
sample covariance, 86
sample mean, 525
sample space, 250, 541, 542, 542
sampling, 92, 98 sampling density, 227
sampling variance, 158
sampling with replacement, 165
sampling without replacement, 165
scale parameter, 278
scatter plot matrix, 79, 89, 110
SDSS, see Sloan Digital Sky Survey
second mode, 532
second order polynomial function, 367, 552
semi-supervised learning, 3
separating hyperplane, 377
Shannon, Claude, 513
similarity measure, 179
similarity-based learning, 17, 179
simple linear regression, 514, 518
simple linear regression model, 326
simple multivariable linear regression, 383
simple random sample, 533
situational fluency, 22, 50, 464, 486
skew, 62
Sloan Digital Sky Survey, 483
social science, 303
soft margin, 383
Sokal-Michener index, 216, 235
spam filtering, 268
sparse data, 217, 226, 228, 241, 268
SPLOM, see scatter plot matrix
stability index, 449, 460, 510
stacked bar plot, 82
stale model, 447, 449, 452, 482
standard error, 348
standard normal distribution, 64
stationarity assumption, 238
stationary distribution, 310
statistical inference, 533
statistical significance, 455
statistical significance test, 347
statistics, 385
step-wise sequential search, 503, 505
stochastic gradient descent, 341
stratification feature, 99
student-t distribution, 278, 280
stunted trees, 480
subagging, 165
subjective estimate, 541
subset generation, 231
subset selection, 231
subspace sampling, 165
sum of squared errors, 327, 383, 443, 446, 513
summary statistics, 103
support vector machine, 323, 346, 376, 386, 390, 500, 514, 515, 518
support vectors, 378
SVM, see support vector machine symmetry criterion, 183, 213
t-test, 348
Tanimoto similarity, 226
target hypersphere, 201
target level imbalance, 501
taxi-cab distance, 183
termination condition, 232
test-statistic, 347
text analytics, 268
textual data, 34
Theorem of Total Probability, 249, 253, 255, 256, 541, 549, 550
thinning, 311
third order polynomial function, 552
timing, 33
TN, see true negative
TNR, see true negative rate
tolerance, 336
top sampling, 98
total sum of squares, 446
TP, see true positive
TPR, see true positive rate
training instance, 4
Transparency International, 242
trapezoidal method, 430
treatment group, 453
triangular inequality criterion, 183, 213
true negative, 402
true positive, 402
true positive rate, 413, 415, 425
type I errors, 402
type II errors, 402
unbiased estimate, 534
unconditional probability, 544
under-sampled training set, 502
uniform distribution, 61
unimodal distribution, 62, 280
unit hypercube, 227
unsupervised learning, 3
use limitation principle, 42
validation, 500
variable elimination, 309
variable selection, 230
variance, 154, 206, 222, 528, 528, 529, 534
variation, 56, 525, 527 vectors, 218
Voronoi region, 187
Voronoi tessellation, 187, 235
weight space, 330, 334, 352, 385
weight update rule, 340
weighted dataset, 164
weighted k nearest neighbor, 193, 211, 241–243
weighted variance, 154
weights, 327
Western Electric rules, 448
whiskers, 538
Wilcoxon-Mann-Whitney statistic, 430
wrapper-based feature selection, 232, 406, 503
z-score, 94
z-transform, 94