Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
R: Unleash Machine Learning Techniques
Table of Contents R: Unleash Machine Learning Techniques R: Unleash Machine Learning Techniques Credits Preface
What this learning path covers What you need for this learning path Who this learning path is for Reader feedback Customer support
Downloading the example code Errata Piracy Questions
I. Module 1
1. Getting Started with R and Machine Learning
Delving into the basics of R
Using R as a scientific calculator Operating on vectors Special values
Data structures in R
Vectors
Creating vectors Indexing and naming vectors
Arrays and matrices
Creating arrays and matrices Names and dimensions Matrix operations
Lists
Creating and indexing lists Combining and converting lists
Data frames
Creating data frames Operating on data frames
Working with functions
Built-in functions User-defined functions Passing functions as arguments
Controlling code flow
Working with if, if-else, and ifelse Working with switch Loops
Advanced constructs
lapply and sapply apply tapply mapply
Next steps with R
Getting help Handling packages
Machine learning basics
Machine learning – what does it really mean? Machine learning – how is it used in the world? Types of machine learning algorithms
Supervised machine learning algorithms Unsupervised machine learning algorithms Popular machine learning packages in R
Summary
2. Let's Help Machines Learn
Understanding machine learning Algorithms in machine learning
Perceptron
Families of algorithms
Supervised learning algorithms
Linear regression K-Nearest Neighbors (KNN)
Collecting and exploring data Normalizing data Creating training and test data sets Learning from data/training the model Evaluating the model
Unsupervised learning algorithms
Apriori algorithm K-Means
Summary
3. Predicting Customer Shopping Trends with Market Basket Analysis
Detecting and predicting trends Market basket analysis
What does market basket analysis actually mean? Core concepts and definitions Techniques used for analysis Making data driven decisions
Evaluating a product contingency matrix
Getting the data Analyzing and visualizing the data Global recommendations Advanced contingency matrices
Frequent itemset generation
Getting started Data retrieval and transformation Building an itemset association matrix Creating a frequent itemsets generation workflow Detecting shopping trends
Association rule mining
Loading dependencies and data Exploratory analysis Detecting and predicting shopping trends Visualizing association rules
Summary
4. Building a Product Recommendation System
Understanding recommendation systems Issues with recommendation systems Collaborative filters
Core concepts and definitions The collaborative filtering algorithm
Predictions Recommendations Similarity
Building a recommender engine
Matrix factorization Implementation Result interpretation
Production ready recommender engines
Extract, transform, and analyze Model preparation and prediction Model evaluation
Summary
5. Credit Risk Detection and Prediction – Descriptive Analytics
Types of analytics Our next challenge What is credit risk? Getting the data Data preprocessing
Dealing with missing values Datatype conversions
Data analysis and transformation
Building analysis utilities Analyzing the dataset Saving the transformed dataset
Next steps
Feature sets Machine learning algorithms
Summary
6. Credit Risk Detection and Prediction – Predictive Analytics
Predictive analytics How to predict credit risk Important concepts in predictive modeling
Preparing the data Building predictive models Evaluating predictive models
Getting the data Data preprocessing Feature selection Modeling using logistic regression Modeling using support vector machines Modeling using decision trees Modeling using random forests Modeling using neural networks Model comparison and selection Summary
7. Social Media Analysis – Analyzing Twitter Data
Social networks (Twitter) Data mining @social networks
Mining social network data Data and visualization
Word clouds Treemaps Pixel-oriented maps Other visualizations
Getting started with Twitter APIs
Overview Registering the application Connect/authenticate Extracting sample tweets
Twitter data mining
Frequent words and associations Popular devices Hierarchical clustering Topic modeling
Challenges with social network data mining References Summary
8. Sentiment Analysis of Twitter Data
Understanding Sentiment Analysis
Key concepts of sentiment analysis
Subjectivity Sentiment polarity Opinion summarization Feature extraction
Approaches Applications Challenges
Sentiment analysis upon Tweets
Polarity analysis Classification-based algorithms
Labeled dataset Support Vector Machines Ensemble methods
Boosting Cross-validation
Summary
II. Module 2
1. Introducing Machine Learning
The origins of machine learning Uses and abuses of machine learning
Machine learning successes The limits of machine learning Machine learning ethics
How machines learn
Data storage Abstraction Generalization Evaluation
Machine learning in practice
Types of input data Types of machine learning algorithms Matching input data to algorithms
Machine learning with R
Installing R packages Loading and unloading R packages
Summary
2. Managing and Understanding Data
R data structures
Vectors Factors Lists Data frames Matrixes and arrays
Managing data with R
Saving, loading, and removing R data structures Importing and saving data from CSV files
Exploring and understanding data
Exploring the structure of data Exploring numeric variables
Measuring the central tendency – mean and median Measuring spread – quartiles and the five-number summary Visualizing numeric variables – boxplots Visualizing numeric variables – histograms Understanding numeric data – uniform and normal distributions Measuring spread – variance and standard deviation
Exploring categorical variables
Measuring the central tendency – the mode
Exploring relationships between variables
Visualizing relationships – scatterplots Examining relationships – two-way cross-tabulations
Summary
3. Lazy Learning – Classification Using Nearest Neighbors
Understanding nearest neighbor classification
The k-NN algorithm
Measuring similarity with distance Choosing an appropriate k Preparing data for use with k-NN
Why is the k-NN algorithm lazy?
Example – diagnosing breast cancer with the k-NN algorithm
Step 1 – collecting data Step 2 – exploring and preparing the data
Transformation – normalizing numeric data Data preparation – creating training and test datasets
Step 3 – training a model on the data Step 4 – evaluating model performance Step 5 – improving model performance
Transformation – z-score standardization Testing alternative values of k
Summary
4. Probabilistic Learning – Classification Using Naive Bayes
Understanding Naive Bayes
Basic concepts of Bayesian methods
Understanding probability Understanding joint probability Computing conditional probability with Bayes' theorem
The Naive Bayes algorithm
Classification with Naive Bayes The Laplace estimator Using numeric features with Naive Bayes
Example – filtering mobile phone spam with the Naive Bayes algorithm
Step 1 – collecting data Step 2 – exploring and preparing the data
Data preparation – cleaning and standardizing text data Data preparation – splitting text documents into words Data preparation – creating training and test datasets Visualizing text data – word clouds Data preparation – creating indicator features for frequent words
Step 3 – training a model on the data Step 4 – evaluating model performance Step 5 – improving model performance
Summary
5. Divide and Conquer – Classification Using Decision Trees and Rules
Understanding decision trees
Divide and conquer The C5.0 decision tree algorithm
Choosing the best split Pruning the decision tree
Example – identifying risky bank loans using C5.0 decision trees
Step 1 – collecting data Step 2 – exploring and preparing the data
Data preparation – creating random training and test datasets
Step 3 – training a model on the data Step 4 – evaluating model performance Step 5 – improving model performance
Boosting the accuracy of decision trees Making mistakes more costlier than others
Understanding classification rules
Separate and conquer The 1R algorithm The RIPPER algorithm Rules from decision trees What makes trees and rules greedy?
Example – identifying poisonous mushrooms with rule learners
Step 1 – collecting data Step 2 – exploring and preparing the data Step 3 – training a model on the data Step 4 – evaluating model performance Step 5 – improving model performance
Summary
6. Forecasting Numeric Data – Regression Methods
Understanding regression
Simple linear regression Ordinary least squares estimation Correlations Multiple linear regression
Example – predicting medical expenses using linear regression
Step 1 – collecting data Step 2 – exploring and preparing the data
Exploring relationships among features – the correlation matrix Visualizing relationships among features – the scatterplot matrix
Step 3 – training a model on the data Step 4 – evaluating model performance Step 5 – improving model performance
Model specification – adding non-linear relationships Transformation – converting a numeric variable to a binary indicator Model specification – adding interaction effects Putting it all together – an improved regression model
Understanding regression trees and model trees
Adding regression to trees
Example – estimating the quality of wines with regression trees and model trees
Step 1 – collecting data Step 2 – exploring and preparing the data Step 3 – training a model on the data
Visualizing decision trees
Step 4 – evaluating model performance
Measuring performance with the mean absolute error
Step 5 – improving model performance
Summary
7. Black Box Methods – Neural Networks and Support Vector Machines
Understanding neural networks
From biological to artificial neurons Activation functions Network topology
The number of layers The direction of information travel The number of nodes in each layer
Training neural networks with backpropagation
Example – Modeling the strength of concrete with ANNs
Step 1 – collecting data Step 2 – exploring and preparing the data Step 3 – training a model on the data Step 4 – evaluating model performance Step 5 – improving model performance
Understanding Support Vector Machines
Classification with hyperplanes
The case of linearly separable data The case of nonlinearly separable data
Using kernels for non-linear spaces
Example – performing OCR with SVMs
Step 1 – collecting data Step 2 – exploring and preparing the data Step 3 – training a model on the data Step 4 – evaluating model performance Step 5 – improving model performance
Summary
8. Finding Patterns – Market Basket Analysis Using Association Rules
Understanding association rules
The Apriori algorithm for association rule learning Measuring rule interest – support and confidence Building a set of rules with the Apriori principle
Example – identifying frequently purchased groceries with association rules
Step 1 – collecting data Step 2 – exploring and preparing the data
Data preparation – creating a sparse matrix for transaction data Visualizing item support – item frequency plots Visualizing the transaction data – plotting the sparse matrix
Step 3 – training a model on the data Step 4 – evaluating model performance Step 5 – improving model performance
Sorting the set of association rules Taking subsets of association rules Saving association rules to a file or data frame
Summary
9. Finding Groups of Data – Clustering with k-means
Understanding clustering
Clustering as a machine learning task The k-means clustering algorithm
Using distance to assign and update clusters Choosing the appropriate number of clusters
Example – finding teen market segments using k-means clustering
Step 1 – collecting data Step 2 – exploring and preparing the data
Data preparation – dummy coding missing values Data preparation – imputing the missing values
Step 3 – training a model on the data Step 4 – evaluating model performance Step 5 – improving model performance
Summary
10. Evaluating Model Performance
Measuring performance for classification
Working with classification prediction data in R A closer look at confusion matrices Using confusion matrices to measure performance Beyond accuracy – other measures of performance
The kappa statistic Sensitivity and specificity Precision and recall The F-measure
Visualizing performance trade-offs
ROC curves
Estimating future performance
The holdout method
Cross-validation Bootstrap sampling
Summary
11. Improving Model Performance
Tuning stock models for better performance
Using caret for automated parameter tuning
Creating a simple tuned model Customizing the tuning process
Improving model performance with meta-learning
Understanding ensembles Bagging Boosting Random forests
Training random forests Evaluating random forest performance
Summary
12. Specialized Machine Learning Topics
Working with proprietary files and databases
Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files Querying data in SQL databases
Working with online data and services
Downloading the complete text of web pages Scraping data from web pages
Parsing XML documents Parsing JSON from web APIs
Working with domain-specific data
Analyzing bioinformatics data Analyzing and visualizing network data
Improving the performance of R
Managing very large datasets
Generalizing tabular data structures with dplyr Making data frames faster with data.table Creating disk-based data frames with ff Using massive matrices with bigmemory
Learning faster with parallel computing
Measuring execution time Working in parallel with multicore and snow Taking advantage of parallel with foreach and doParallel Parallel cloud computing with MapReduce and Hadoop
GPU computing Deploying optimized learning algorithms
Building bigger regression models with biglm Growing bigger and faster random forests with bigrf Training and evaluating models in parallel with caret
Summary
III. Module 3
1. A Process for Success
The process Business understanding
Identify the business objective Assess the situation Determine the analytical goals Produce a project plan
Data understanding Data preparation Modeling Evaluation Deployment Algorithm flowchart Summary
2. Linear Regression – The Blocking and Tackling of Machine Learning
Univariate linear regression
Business understanding
Multivariate linear regression
Business understanding Data understanding and preparation Modeling and evaluation
Other linear model considerations
Qualitative feature Interaction term
Summary
3. Logistic Regression and Discriminant Analysis
Classification methods and linear regression Logistic regression
Business understanding Data understanding and preparation Modeling and evaluation
The logistic regression model Logistic regression with cross-validation
Discriminant analysis overview Discriminant analysis application
Model selection Summary
4. Advanced Feature Selection in Linear Models
Regularization in a nutshell
Ridge regression LASSO Elastic net
Business case
Business understanding Data understanding and preparation
Modeling and evaluation
Best subsets Ridge regression LASSO Elastic net Cross-validation with glmnet
Model selection Summary
5. More Classification Techniques – K-Nearest Neighbors and Support Vector Machines
K-Nearest Neighbors Support Vector Machines Business case
Business understanding Data understanding and preparation Modeling and evaluation
KNN modeling SVM modeling
Model selection
Feature selection for SVMs Summary
6. Classification and Regression Trees
Introduction An overview of the techniques
Regression trees Classification trees Random forest Gradient boosting
Business case
Modeling and evaluation
Regression tree Classification tree Random forest regression Random forest classification Gradient boosting regression Gradient boosting classification
Model selection
Summary
7. Neural Networks
Neural network Deep learning, a not-so-deep overview Business understanding Data understanding and preparation Modeling and evaluation An example of deep learning
H2O background Data preparation and uploading it to H2O Create train and test datasets Modeling
Summary
8. Cluster Analysis
Hierarchical clustering
Distance calculations
K-means clustering Gower and partitioning around medoids
Gower PAM Business understanding
Data understanding and preparation Modeling and evaluation
Hierarchical clustering K-means clustering Clustering with mixed data
Summary
9. Principal Components Analysis
An overview of the principal components
Rotation Business understanding Data understanding and preparation
Modeling and evaluation
Component extraction Orthogonal rotation and interpretation Creating factor scores from the components Regression analysis
Summary
10. Market Basket Analysis and Recommendation Engines
An overview of a market basket analysis Business understanding Data understanding and preparation Modeling and evaluation An overview of a recommendation engine
User-based collaborative filtering Item-based collaborative filtering Singular value decomposition and principal components analysis
Business understanding and recommendations Data understanding, preparation, and recommendations Modeling, evaluation, and recommendations Summary
11. Time Series and Causality
Univariate time series analysis
Bivariate regression Granger causality Business understanding Data understanding and preparation
Modeling and evaluation
Univariate time series forecasting Time series regression Examining the causality
Summary
12. Text Mining
Text mining framework and methods Topic models
Other quantitative analyses Business understanding Data understanding and preparation
Modeling and evaluation
Word frequency and topic models Additional quantitative analysis
Summary
A. R Fundamentals
Introduction Getting R up and running Using R Data frames and matrices Summary stats Installing and loading the R packages Summary
A. Bibliography Index
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion