Index

68− 95− 99.7 rule, 64, 75

art, xvi

χ2 pruning, 159

χ2 statistic, 452

ε0 bootstrap, 411

 

absence-presence, 215

absolute rarity, 502

ABT, see analytics base table

AdaBoost, 169

aggregate features, 35

analytics base table, 14, 21, 27, 27, 47, 52, 53, 55, 100, 104, 467

analytics solution, 21, 21

ANOVA test, 92, 102

Anscombe’s quartet, 90

anti-discrimination legislation, 41

area under the curve, 429, 460

arithmetic mean, 416, 418, 446, 461, 525, 525

artificial intelligence, 315

artificial neural networks, 387

astronomy, 483

AUC, see area under the curve

average class accuracy, 416, 417, 421, 446, 455, 461, 477

 

backward sequential selection, 233

bag-of-words, 226, 240

bagging, 163, 165, 169, 515, 518

balanced sample, 472

bar plot, 56, 525, 534

basis functions, 366, 381, 385

batch gradient descent, 341

Bayes’ Theorem, 247, 249, 252, 513

Bayes, Thomas, 252

Bayesian information criterion, 302

Bayesian MAP prediction model, 260

Bayesian network, 247, 272, 293, 315, 514, 515, 518

Bayesian optimal classifier, 259

Bayesian score, 303

bi-modal distribution, 63

bias, 98, 533

BIC, see Bayesian information criterion

binary data, 34

binary logarithm, 125

binary tree, 196

binning, 94, 109, 150, 289

binomial distribution, 177

bins, 94

bits, 126

black box model, 522

Blondlot, René, 397

boosting, 163, 164, 169, 515, 518

bootstrap aggregating, see bagging

bootstrap samples, 165

bootstrapping, 411

box plot, 56, 525, 534, 538

branches, 121

brute-force search, 331

burn-in time, 311

business problem, 21

Business Understanding, 13, 17, 22, 27, 48, 49, 463, 483, 512

 

C4.5, 167

calculus, 551

capacity for model retraining, 521

Cardano, Gerolamo, 247

cardinality, 56, 56, 472

CART, 167

case-based reasoning, 238

categorical data, 34

causal graphs, 303

CBR, see case-based reasoning

central tendency, 56, 416, 525, 526

chain rule (differentiation), 324, 338, 360, 551, 554

chain rule (probability), 249, 256, 263, 300, 541, 549, 554

Chebyshev distance, 185

chessboard distance, 185

child node, 295

churn prediction, 453, 463

citizen science, 489

clamp transformation, 74, 499

classification accuracy, 404, 409, 417

co-absence, 214, 215

co-presence, 214, 215

Cohen’s kappa, 509

collection limitation principle, 42

comparative experiments, 453

complete case analysis, 73, 494

component, 282

composite function, 554

concept drift, 236, 447, 448, 509

condensed nearest neighbor, 238

conditional independence, 261, 262, 267, 313

conditional maximum entropy model, 373

conditional probability, 249, 251, 256, 541, 544, 544, 548

conditional probability table, 294

conditionally independent, 293, 297

confidence factor, 164

confounding feature, 92

confusion matrix, 402, 420, 440, 461

consistent model, 4, 5, 121, 143

constrained quadratic optimization problem, 379

constraints, 379

continuous data, 34

continuous function, 552

control group, 453

convergence, 336

convex surface, 331

coordinate system, 181

correlation, 86, 88, 100, 110, 226

correlation matrix, 89

Corruption Perception Index, 242, 304

cosine, 218

cosine similarity, 218, 226, 235, 241

covariance, 86, 221

covariance matrix, 88, 222

CPI, see Corruption Perception Index

CPT, see conditional probability table

credit scoring, 403, 420, 430

CRISP-DM, see Cross Industry Standard Process for Data Mining

critical value pruning, 159

CRM, see customer relationship management

Cross Industry Standard Process for Data Mining, 12, 20, 27, 48, 55, 100, 398, 511

cross-sell model, 438

crowdsourcing, 489

cubic function, 552

cumulative gain, 433, 435, 479, 480

cumulative gain chart, 436, 438, 461

cumulative lift, 433, 437, 479, 480

cumulative lift chart, 438

cumulative lift curve, 438

curse of dimensionality, 228, 236, 260, 268, 523, 548

customer churn, 50

customer relationship management, 438

 

d-separation, 300

data, 1

data analytics, 1

data availability, 33

data exploration, 33, 55, 100

data fragmentation, 260, 268

data management tools, 43

data manipulation, 43

data manipulation tools, 43

data mining, 12

Data Preparation, 14, 17, 27, 48, 49, 55, 92, 100, 101, 400, 471, 495, 512

data protection legislation, 41

data quality issues, 55, 66, 100

data quality plan, 66, 100

data quality report, 55, 56, 100, 105, 112, 472, 492

data subject, 41

Data Understanding, 14, 17, 27, 48, 49, 55, 100, 467, 488, 512

data visualization, 106, 534

data-driven decisions, 17

database management systems, 43

dataset, 3 de Fermat, Pierre, 247

deciles, 434, 452

decision boundary, 188, 353, 376, 518

decision surface, 356

decision tree, 117, 121, 121, 167, 406, 421, 423, 514, 515, 518, 519

decisions, 1

deep learning, 524

degrees of freedom, 279

delta value, 336

density, 534, 537

density curve, 64

density histogram, 537

Deployment, 15, 18, 482, 509, 512

derivative, 551

derived features, 34, 43, 48

descriptive features, 3, 17, 21, 27, 467

diagnosis, 2

differentiation, 386, 551

discontinuous function, 356

discriminative model, 515

disease diagnosis, 403

distance metric, 183, 235

distance weighted k nearest neighbor, 193

distributions, 109

document classification, 2, 226

domain, 34, 541

domain concept, 21, 29, 48, 467, 488

domain concepts, 468

domain representation, 514

domain subconcept, 32

dosage prediction, 1

dot product, 218, 333, 357

 

eager learner, 236

early stopping criteria, 155, 159

ecological modeling, 137

edges, 294

EEG, see electroencephalography pattern recognition

electroencephalography pattern recognition, 369

email classification, 401

ensembles, see model ensemble entropy, 45, 117, 120, 125, 154, 170, 172, 513

equal-frequency binning, 94, 97, 109, 289, 304, 320

equal-width binning, 94, 96, 109, 289

equation of a line, 325

ergodic, 310

error function, 323, 327, 327, 383

error rate, 160

error surface, 323, 330

error-based learning, 17, 323 ethics, 49

ETL, see extract-transform-load

Euclidean coordinate space, 183

Euclidean distance, 183, 200, 235, 241, 444, 513

Euclidean norm, 380

Euler’s number, 357, 449

Evaluation, 15, 18, 398, 479, 508, 512

event, 250, 541, 542 experiment, 250, 541, 542

experimental design, 456

exponential distribution, 63, 76, 277, 281

extract-transform-load, 43, 482

 

F measure, see F1 measure

F score, see F1 measure

F1 measure, 414, 416, 416, 418

F1 score, see F1 measure

factorization, 261, 313

factors, 264

false alarms, 402

false negative, 402, 422

false negative rate, 413

false positive, 259, 402, 422

false positive rate, 413

fat tails, 279

feature selection, 101, 179, 230, 230, 236, 503, 523

feature space, 179, 181, 235

feature subset space, 231

features, 48

filters, 230

fit, 327, 383

flag features, 36

FN, see false negative

FNR, see false negative rate

folds, 408

forward reasoning, 252

forward sequential selection, 233

FP, see false positive

FPR, see false positive rate

fraud detection, 269, 403

frequency counts, 530

frequency histogram, 536

frequency table, 531

full joint probability distribution, 292, 313, 547

 

gain, 433, 434, 436

gain chart, 435, 438

galaxy morphology, 484

Galaxy Zoo, 489

gamma function, 278

Gapminder, 242

Gauss, Carl Friedrich, 330

Gauss-Jordan elimination, 222

Gaussian distribution, 64

Gaussian radial basis kernel, 382

generalization, 9, 12, 400

Generalized Bayes’ Theorem, 256

generative model, 515

Gibbs sampling, 309

Gini coefficient, 304, 430, 455, 460

Gini index, 148, 167, 172, 430

global minimum, 331

Goldilocks model, 12

gradient, 334, 384

gradient descent, 279, 282, 283, 331, 332, 334, 384, 406

graphical models, 315, 524

greedy local search problem, 231

group think, 163

guided search, 282, 331, 334

 

hamming distance, 245

Hand, David, 456

harmonic mean, 416, 418, 418, 421, 446, 455, 477

heating load prediction, 388

heterogeneity, 126

hidden features, 547

histogram, 56, 525, 534, 535 hits, 402

hold-out sampling, 406, 412

hold-out test set, 397, 400, 405, 448, 500

hyperplane, 196, 196, 377

hypersphere, 220

 

ID3, see Iterative Dichotomizer 3

identity criterion, 183, 213

identity matrix, 222

ill-posed problem, 6, 9, 17, 19, 20, 148

imbalanced data, 192, 417, 472

imputation, 74, 307

independence, 262, 313

independent features, 88

index, 213, 214

inductive bias, 10, 10, 17, 20, 123, 144, 341, 372, 377, 511, 518

inductive learning, 10, 511

information, 121

information gain, 117, 121, 129, 131, 134, 136, 170, 172, 231, 499

information gain ratio, 144, 172

information theory, 117, 126

information-based learning, 17, 117

insights, 1

instance, 3, 28

integration, 284

inter-annotator agreement, 508, 509

inter-quartile range, 75, 530, 538

interacting features, 230

interaction effect, 168

interaction term, 371

interior nodes, 121

interpolate, 529

interpretability of models, 522

interval data, 34

interval size, 284

invalid data, 66, 100

invalid outliers, 69, 71, 499

invariant distribution, 310

inverse covariance matrix, 222

inverse reasoning, 252

IQR, see inter-quartile range

irregular cardinality, 66, 68, 100

irrelevant features, 230

Iterative Dichotomizer 3, 10, 117, 134, 134, 167, 171, 175, 406, 513

 

J48, 167

Jaccard index, 217, 235

jackknifing, 411

joint probability, 251, 256, 544

joint probability distribution, 251, 546

 

k nearest neighbor, 179, 191, 235, 400, 421, 443, 500, 513, 515, 518

k-d tree, 179, 196, 214, 236, 246

k-fold cross validation, 408

k-NN, see k nearest neighbor

K-S chart, see Kolmogorov-Smirnov chart

K-S statistic, see Kolmogorov-Smirnov statistic

K2 score, 303

kernel function, 382, 390

kernel trick, 382, 390

knowledge elicitation, 30

Kolmogorov-Smirnov chart, 431

Kolmogorov-Smirnov statistic, 431, 452

Kolmogorov-Smirnov test, 280

Kronecker delta, 191, 194

 

labeled dataset, 7

Lagrange multipliers, 378

Laplace smoothing, 274, 321

lazy learner, 236

leaf nodes, 121

learning rate, 337, 346

learning rate decay, 349

least squares optimization, 331

leave-one-out cross validation, 411

left skew, 63

levels, 34

lift, 433, 436, 479, 480

lift chart, 438

light tails, 279

linear function, 552

linear kernel, 382

linear relationship, 325, 365, 385

linear separator, 353

linearly separable, 353, 370, 381

local models, 188

locality sensitive hashing, 238

location parameter, 278

location-scale family of distributions, 278

logarithm, 124

logistic function, 357

logistic regression, 323, 353, 357, 384, 423, 500, 514, 515, 518, 519

LogitBoost, 169

long tails, 63

longevity, 33

loss functions, 327

loss given default, 421

lower quartile, 530, 538

LU decomposition, 222

lucky split, 408, 456

 

machine learning, 2, 3

machine learning algorithm, 5, 17

MAE, see mean absolute error

Mahalanobis distance, 221, 226, 235

Manhattan distance, 183, 183, 235, 241, 444

MAP, see maximum a posteriori

mapping features, 36, 68

margin, 377

margin extents, 377, 380

margin of error, 532

marginalization, 547

Markov blanket, 297

Markov chain, 309, 309

Markov chain Monte Carlo, 309, 515

MaxEnt model, 373

maximum a posteriori, 259, 267, 423

maximum entropy model, 373

maximum likelihood, 313

MCMC, see Markov chain Monte Carlo

mean, 56, 74, 525, 526

mean absolute error, 444, 446

mean imputation, 392

mean squared error, 443

measures of similarity, 179

median, 56, 74, 416, 526, 526, 527, 530, 538

metric, 183, 213

minimum description length principle, 302

Minkowski distance, 184, 184

misclassification rate, 397, 401, 403, 404, 416, 417

misses, 402

missing indicator feature, 73

missing values, 66, 67, 100, 235, 472

mixing time, 311

mixture of Gaussians distribution, 277, 281

mode, 56, 74, 527, 531

mode imputation, 392

model ensemble, 163, 168, 177, 515

model parameters, 327

Modeling, 14, 17, 92, 477, 500, 512

Monte Carlo methods, 309

MSE, see mean squared error multi-label classification, 524

multimodal distribution, 63, 282

multinomial logistic regression, 373, 394

multinomial model, 323, 385, 440

multivariable linear regression, 332, 443

multivariable linear regression with gradient descent, 10, 323, 513

 

N rays, 397

naive Bayes model, 247, 267, 292, 320, 321, 423, 513, 514, 518, 519

natural language processing, 238

natural logarithm, 449

nearest neighbor, 315, 514, 519

nearest neighbor algorithm, 179, 186, 235

negative level, 402

negatively covariant, 79

neural networks, 387

next-best-offer model, 39

No Free Lunch Theorem, 11, 518

nodes, 294

noise, 6, 69, 73, 190

noise dampening mechanism, 163

non-linear model, 323

non-linear relationship, 385

non-negativity criterion, 183, 213

non-parametric model, 514

normal distribution, 62, 64, 75, 83, 277, 424

normalization, 93, 179, 206, 235, 343, 346, 361

normalization constant, 256

null hypothesis, 347

numeric data, 34

 

observation period, 37, 468

Occam’s razor, 123, 302

on-going model validation, 447, 482

one-class classification, 239

one-row-per-subject, 29

one-versus-all model, 373, 373, 383, 385

ordinal data, 34

other features, 36

out-of-time sampling, 412, 456

outcome, 541

outcome period, 37, 468

outlier detection, 239

outliers, 66, 69, 93, 98, 100, 473, 526, 538

over-sampling, 99

overfitting, 11, 158, 163, 192, 261, 272, 406

overlap metric, 245

 

p-value, 348

paradox of the false positive, 259

parameterized model, 323, 324, 327

parametric model, 514

parent node, 295

Pareto charts, 534

partial derivative, 324, 331, 551, 555

Pascal, Blaise, 247

PDF, see probability density function

Pearson correlation, 226

Pearson product-moment correlation coefficient, 88

Pearson, Karl, 88

peeking, 400

percentiles, 56, 97, 434, 529

perceptron learning rule, 357

performance measure, 400, 404

personal data, 41

placebo, 453

polynomial functions, 552

polynomial kernel, 382

polynomial relationship, 367

population, 532

population mean, 64

population parameters, 533

population standard deviation, 64

positive level, 402

positively covariant, 79

post-pruning, 159, 478

posterior probability, 544

posterior probability distribution, 255

pre-pruning, 159, 168

precision, 414, 415, 440

prediction, 2

prediction model, 1, 17

prediction score, 423, 442

prediction speed, 521

prediction subject, 21, 28, 467, 488

predictive data analytics, 1, 1, 19

predictive features, 230

preference bias, 10

presence-absence, 215

price prediction, 1

prior probability, 256, 544

probability density function, 64, 250, 277, 543

probability distribution, 61, 100, 251, 534, 546

probability function, 250, 543

probability mass, 273, 543

probability mass function, 250, 543

probability theory, 247, 541

probability-based learning, 17, 247

product rule, 249, 254, 541, 549

profit matrix, 420

propensity modeling, 2, 37, 468

proportions, 530

proxy features, 32, 36

pruning, 117, 168

pruning dataset, 160

purpose specification principle, 42

 

quadratic function, 368, 552

 

R, 225, 283

R2, 446, 455

r-trees, 238

random forest, 165, 169, 174

random sampling, 98

random sampling with replacement, 99

random sampling without replacement, 100

random variable, 250, 541, 542

range, 528, 528

range normalization, 93, 93, 108, 207, 335, 355, 358, 392, 393

rank and prune, 230

rate parameter, 281

ratio features, 36

raw features, 34, 43, 48

recall, 414, 415, 417, 440

receiver operating characteristic curve, 425, 459

receiver operating characteristic index, 425, 429, 456

receiver operating characteristic space, 427

reduced error pruning, 160, 173, 478

redundant features, 230

regression task, 153

regression tree, 153

reinforcement learning, 3

relative frequency, 249, 541, 543

relative rarity, 502

replicated training set, 164

residual, 328

restriction bias, 10, 372

right skew, 62

risk assessment, 1

RMSE, see root mean squared error

ROC curve, see receiver operating characteristic curve

ROC index, see receiver operating characteristic index

ROC space, see receiver operating characteristic space

root mean squared error, 444, 446

root node, 121

Russel-Rao index, 215, 235

 

sabremetrics, 181

sample, 406, 525, 532

sample covariance, 86

sample mean, 525

sample space, 250, 541, 542, 542

sampling, 92, 98 sampling density, 227

sampling method, 405, 411

sampling variance, 158

sampling with replacement, 165

sampling without replacement, 165

scale parameter, 278

scatter plot, 78, 181

scatter plot matrix, 79, 89, 110

SDSS, see Sloan Digital Sky Survey

second mode, 532

second order polynomial function, 367, 552

semi-supervised learning, 3

sensitivity, 414, 427

separating hyperplane, 377

Shannon, Claude, 513

similarity index, 214, 235

similarity measure, 179

similarity-based learning, 17, 179

simple linear regression, 514, 518

simple linear regression model, 326

simple multivariable linear regression, 383

simple random sample, 533

situational fluency, 22, 50, 464, 486

skew, 62

Sloan Digital Sky Survey, 483

slope of a line, 325, 553

small multiples, 80, 83

smoothing, 247, 272, 273, 291

social science, 303

soft margin, 383

Sokal-Michener index, 216, 235

spam filtering, 268

sparse data, 217, 226, 228, 241, 268

specificity, 414, 427

SPLOM, see scatter plot matrix

stability index, 449, 460, 510

stacked bar plot, 82

stale model, 447, 449, 452, 482

standard deviation, 56, 529

standard error, 348

standard normal distribution, 64

standard scores, 94, 499

standardization, 93, 108

stationarity assumption, 238

stationary distribution, 310

statistical inference, 533

statistical significance, 455

statistical significance test, 347

statistics, 385

step-wise sequential search, 503, 505

stochastic gradient descent, 341

stratification feature, 99

stratified sampling, 99, 492

student-t distribution, 278, 280

stunted trees, 480

subagging, 165

subjective estimate, 541

subset generation, 231

subset selection, 231

subspace sampling, 165

sum of squared errors, 327, 383, 443, 446, 513

summary statistics, 103

summing out, 252, 547, 549

supervised learning, 3, 19

support vector machine, 323, 346, 376, 386, 390, 500, 514, 515, 518

support vectors, 378

SVM, see support vector machine symmetry criterion, 183, 213

 

t-test, 348

Tanimoto similarity, 226

target feature, 3, 17, 27

target hypersphere, 201

target level imbalance, 501

taxi-cab distance, 183

termination condition, 232

test set, 400, 406

test-statistic, 347

text analytics, 268

textual data, 34

Theorem of Total Probability, 249, 253, 255, 256, 541, 549, 550

thinning, 311

third order polynomial function, 552

timing, 33

TN, see true negative

TNR, see true negative rate

tolerance, 336

top sampling, 98

total sum of squares, 446

TP, see true positive

TPR, see true positive rate

training instance, 4

training set, 4, 406, 500

Transparency International, 242

trapezoidal method, 430

treatment group, 453

tree pruning, 159, 167

triangular inequality criterion, 183, 213

true negative, 402

true negative rate, 413, 425

true positive, 402

true positive rate, 413, 415, 425

two-stage model, 505, 507

type I errors, 402

type II errors, 402

 

unbiased estimate, 534

unconditional probability, 544

under-sampled training set, 502

under-sampling, 99, 502

underfitting, 11, 192

uniform distribution, 61

unimodal distribution, 62, 280

unit hypercube, 227

unsupervised learning, 3

upper quartile, 530, 538

upsell model, 214, 438

use limitation principle, 42

 

valid data, 66, 100

valid outliers, 69, 72, 499

validation, 500

validation set, 160, 406

variable elimination, 309

variable selection, 230

variance, 154, 206, 222, 528, 528, 529, 534

variation, 56, 525, 527 vectors, 218

Voronoi region, 187

Voronoi tessellation, 187, 235

 

weight space, 330, 334, 352, 385

weight update rule, 340

weighted dataset, 164

weighted k nearest neighbor, 193, 211, 241243

weighted variance, 154

weights, 327

Western Electric rules, 448

whiskers, 538

Wilcoxon-Mann-Whitney statistic, 430

wrapper-based feature selection, 232, 406, 503

 

z-score, 94

z-transform, 94