Index
A
- abstraction / Abstraction
- activation function / From biological to artificial neurons
- active-user / Result interpretation
- AdaBoost
- AdaBoost.M1 algorithm / Boosting
- adaptive boosting
- Adaptive Boosting (AdaBoost) / Boosting
- Aikake's Information Criterion (AIC) / Modeling and evaluation
- algorithm flowchart
- algorithms
- allocation function / Understanding ensembles
- American Diabetes Association (ADA)
- analytics
- Apache Hadoop
- Application Programming Interfaces (APIs)
- apply function / apply
- Apriori
- Apriori algorithm / Apriori algorithm
- apriori algorithms
- Apriori principle
- Area under curve (AUC) / Evaluating predictive models
- Area Under the Curve (AUC)
- arrays and matrices
- Artificial Neural Network (ANN)
- Artificial Neural Networks (ANNs)
- arules* Mining Association Rules and Frequent Itemsets
- Association for Computational Linguistics (ACL) / Feature extraction
- association rule mining
- association rules
- Augmented Dickey-Fuller (ADF) test
- Autocorrelation Function (ACF)
- automated parameter tuning
- Autoregressive Integrated Moving Average (ARIMA) models
- axon
B
C
D
E
F
G
H
I
J
K
L
M
- machine learning
- machine learning, in practice
- machine learning, process
- machine learning algorithms
- magrittr package
- Mallow's Cp (Cp) / Modeling and evaluation
- mapply function / mapply
- MapReduce
- margin
- marginal likelihood
- market basket analysis
- market basket analysis example
- data, collecting / Step 1 – collecting data
- data, preparing / Step 2 – exploring and preparing the data
- data, exploring / Step 2 – exploring and preparing the data
- sparse matrix, creating for transaction data / Data preparation – creating a sparse matrix for transaction data
- item support, visualizing / Visualizing item support – item frequency plots
- transaction data, visualizing / Visualizing the transaction data – plotting the sparse matrix
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
- set of association rules, sorting / Sorting the set of association rules
- subset of association rules, sorting / Taking subsets of association rules
- association rules, saving to file / Saving association rules to a file or data frame
- association rules, saving to data frame / Saving association rules to a file or data frame
- matrices
- matrix / Matrixes and arrays
- matrix factorization / Building a recommender engine, Matrix factorization
- matrix notation / Multiple linear regression
- matrix operations
- maximum margin hyperplane (MMH) / Classification with hyperplanes
- mean / Measuring the central tendency – mean and median
- mean absolute error (MAE) / Model evaluation, Measuring performance with the mean absolute error
- mean squared error (MSE) / Model evaluation
- medical expenses, predicting with linear regression
- about / Example – predicting medical expenses using linear regression
- data, collecting / Step 1 – collecting data
- data, preparing / Step 2 – exploring and preparing the data
- data, exploring / Step 2 – exploring and preparing the data
- correlation matrix / Exploring relationships among features – the correlation matrix
- relationships, visualizing among features / Visualizing relationships among features – the scatterplot matrix
- scatterplot matrix / Visualizing relationships among features – the scatterplot matrix
- model, training on data / Step 3 – training a model on the data
- model performance, training / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance, Model specification – adding non-linear relationships, Transformation – converting a numeric variable to a binary indicator, Model specification – adding interaction effects, Putting it all together – an improved regression model
- medoid
- message-passing interface (MPI)
- meta-learners / Types of machine learning algorithms
- meta-learning methods
- min-max normalization / Preparing data for use with k-NN
- miss rate / Evaluating predictive models
- mobile phone spam
- mobile phone spam example
- data, collecting / Step 1 – collecting data
- dat a collecting, URL / Step 1 – collecting data
- data, preparing / Step 2 – exploring and preparing the data
- data, exploring / Step 2 – exploring and preparing the data
- text data, cleaning / Data preparation – cleaning and standardizing text data
- text data, standardizing / Data preparation – cleaning and standardizing text data
- text documents, splitting into words / Data preparation – splitting text documents into words
- training, creating / Data preparation – creating training and test datasets
- test datasets, creating / Data preparation – creating training and test datasets
- text data, visualizing / Visualizing text data – word clouds
- indicator features, creating for frequent words / Data preparation – creating indicator features for frequent words
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
- model
- modeling
- modeling process
- model performance
- model performance, breast cancer example
- model trees / Understanding regression trees and model trees
- multicore package
- multilayer network
- Multilayer Perceptron (MLP)
- multimodal / Measuring the central tendency – the mode
- multinomial logistic regression / Understanding regression
- multiple linear regression / Understanding regression
- multiple R-squared value (coefficient of determination) / Step 4 – evaluating model performance
- Multivariate Adaptive Regression Splines (MARS) / Linear regression
- multivariate linear regression
- multivariate relationships
N
O
P
- 2p models
- parallel cloud computing
- parallel computing
- parameter tuning
- Partial Autocorrelation Function (PACF)
- Partitioning Around Medoids (PAM)
- Parts of Speech (POS) / Feature extraction
- pattern discovery / Types of machine learning algorithms
- Pearson's correlation coefficient / Correlations
- Pearson correlation / Similarity
- Pearson Correlation Coefficient
- perceptron / Perceptron
- performance
- performance measures
- performance tradeoffs
- pixel-oriented maps / Pixel-oriented maps
- poisonous mushrooms
- poisonous mushrooms example, with rule learners
- Poisson regression
- Polarity / Polarity analysis
- polynomial kernel / Using kernels for non-linear spaces
- Porter stemming algorithm
- positive predictive value / Evaluating predictive models, Precision and recall
- posterior probability
- postpruning
- pre-pruning
- precision / Evaluating predictive models, Precision and recall
- Prediction Error Sum of Squares (PRESS) / Modeling and evaluation
- prediction operation / Core concepts and definitions
- predictive analytics / Types of analytics, Our next challenge
- predictive model / Types of machine learning algorithms
- predictive modeling
- predictive models, building
- predictive models, evaluating
- prescriptive analytics / Types of analytics
- Principal Component Analysis (PCA) / Model preparation and prediction
- principal components
- Principal Components Analysis (PCA)
- principal components analysis (PCA)
- prior probability
- probability
- product contingency matrix
- proprietary files
- about / Working with proprietary files and databases
- Microsoft Excel files, reading / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
- Microsoft Excel files, writing / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
- SAS files, writing / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
- SAS files, reading / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
- SPSS files, reading / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
- SPSS files, writing / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
- Stata files, writing / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
- Stata files, reading / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
- proprietary microarray
- pure / Choosing the best split
- purity / Choosing the best split
Q
R
- 1 R algorithm / The 1R algorithm
- R
- R, performance improvement
- R-squared value / Step 4 – evaluating model performance
- radial basis function (RBF) / Modeling using support vector machines
- Radial Basis Function (RBF) network
- radial bias kernel (rbf) / Support Vector Machines
- radical
- random forest
- random forest classification
- random forest regression
- random forests
- rate of descent / Matrix factorization
- ratings matrix / Core concepts and definitions
- RCurl
- Read-Evaluate-Print Loop (REPL) / Delving into the basics of R
- rea under the ROC curve (AUC) / ROC curves
- Receiver Operating Characteristic (ROC)
- Receiver Operating Characteristic (ROC) curve
- Receiver Operating Characteristic Curves (ROC)
- Receiver Operator Characteristic (ROC) curve / Evaluating predictive models
- recommendation engine
- overview / An overview of a recommendation engine
- collaborative filtering / An overview of a recommendation engine
- business understanding / Business understanding and recommendations
- data, understanding / Data understanding, preparation, and recommendations
- data, preparing / Data understanding, preparation, and recommendations
- modeling / Modeling, evaluation, and recommendations
- evaluation / Modeling, evaluation, and recommendations
- recommendations / Modeling, evaluation, and recommendations
- recommendation systems
- recommendation systems, issues
- recommendation systems, product ready
- recommendation systems, types
- recommender engine
- recommender engines
- recommenderlab / Production ready recommender engines
- recommenderlab library
- recommend operation / Core concepts and definitions
- recurrent network
- recursive partitioning
- references / References
- regression
- regression analysis
- regression equations
- regression models
- regression trees
- regularization / Matrix factorization
- regularization, modeling
- relationships
- Repeated Incremental Pruning to Produce Error Reduction (RIPPER) algorithm / The RIPPER algorithm
- residuals / Ordinary least squares estimation
- Residual Sum of Squares (RSS) / Univariate linear regression
- Residuals vs Leverage plot / Business understanding
- Restricted Boltzmann Machine
- resubstitution error / Estimating future performance
- Revolution Analytics
- RHadoop
- RHIPE package
- ridge regression
- right-hand side (RHS) / Core concepts and definitions
- rio package
- RIPPER algorithm
- risky bank loans
- root mean squared error (RMSE) / Model evaluation
- Root Mean Square Error (RMSE)
- Root Mean Square Error/RMSE) / Cross-validation
- rotational estimation / Cross-validation
- rote learning
- R packages
- rpart.plot
- RStudio
- RTextTools / Cross-validation
- rudimentary ANNs / Understanding neural networks
- rvest package
S
T
U
V
W
- web pages
- web scraping
- weighted average / Predictions
- whiskers
- wine quality estimation, with regression trees
- word cloud
- wordcloud package
- word clouds / Word clouds
X
- xml2 GitHub
- XML documents
- XML package
Z