Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
Acquire and Prepare the Ingredients - Your Data
Introduction
Working with data
Reading data from CSV files
Getting ready
How to do it...
How it works...
There's more...
Handling different column delimiters
Handling column headers/variable names
Handling missing values
Reading strings as characters and not as factors
Reading data directly from a website
Reading XML data
Getting ready
How to do it...
How it works...
There's more...
Extracting HTML table data from a web page
Extracting a single HTML table from a web page
Reading JSON data
Getting ready
How to do it...
How it works...
Reading data from fixed-width formatted files
Getting ready
How to do it...
How it works...
There's more...
Files with headers
Excluding columns from data
Reading data from R files and R libraries
Getting ready
How to do it...
How it works...
There's more...
Saving all objects in a session
Saving objects selectively in a session
Attaching/detaching R data files to an environment
Listing all datasets in loaded packages
Removing cases with missing values
Getting ready
How to do it...
How it works...
There's more...
Eliminating cases with NA for selected variables
Finding cases that have no missing values
Converting specific values to NA
Excluding NA values from computations
Replacing missing values with the mean
Getting ready
How to do it...
How it works...
There's more...
Imputing random values sampled from non-missing values
Removing duplicate cases
Getting ready
How to do it...
How it works...
There's more...
Identifying duplicates without deleting them
Rescaling a variable to specified min-max range
Getting ready
How to do it...
How it works...
There's more...
Rescaling many variables at once
See also
Normalizing or standardizing data in a data frame
Getting ready
How to do it...
How it works...
There's more...
Standardizing several variables simultaneously
See also
Binning numerical data
Getting ready
How to do it...
How it works...
There's more...
Creating a specified number of intervals automatically
Creating dummies for categorical variables
Getting ready
How to do it...
How it works...
There's more...
Choosing which variables to create dummies for
Handling missing data
Getting ready
How to do it...
How it works...
There's more...
Understanding missing data pattern
Correcting data
Getting ready
How to do it...
How it works...
There's more...
Combining multiple columns to single columns
Splitting single column to multiple columns
Imputing data
Getting ready
How to do it...
How it works...
There's more...
Detecting outliers
Getting ready
How to do it...
How it works...
There's more...
Treating the outliers with mean/median imputation
Handling extreme values with capping
Transforming and binning values
Outlier detection with LOF
What's in There - Exploratory Data Analysis
Introduction
Creating standard data summaries
Getting ready
How to do it...
How it works...
There's more...
Using the str() function for an overview of a data frame
Computing the summary and the str() function for a single variable
Finding other measures
Extracting a subset of a dataset
Getting ready
How to do it...
How it works...
There's more...
Excluding columns
Selecting based on multiple values
Selecting using logical vector
Splitting a dataset
Getting ready
How to do it...
How it works...
Creating random data partitions
Getting ready
How to do it...
Case 1 - Numerical target variable and two partitions
Case 2 - Numerical target variable and three partitions
Case 3 - Categorical target variable and two partitions
Case 4 - Categorical target variable and three partitions
How it works...
There's more...
Using a convenience function for partitioning
Sampling from a set of values
Generating standard plots, such as histograms, boxplots, and scatterplots
Getting ready
How to do it...
Creating histograms
Creating boxplots
Creating scatterplots
Creating scatterplot matrices
How it works...
Histograms
Boxplots
There's more...
Overlay a density plot on a histogram
Overlay a regression line on a scatterplot
Color specific points on a scatterplot
Generating multiple plots on a grid
Getting ready
How to do it...
How it works...
Graphics parameters
Creating plots with the lattice package
Getting ready
How to do it...
How it works...
There's more...
Adding flair to your graphs
See also
Creating charts that facilitate comparisons
Getting ready
How to do it...
Using base plotting system
How it works...
There's more...
Creating beanplots with the beanplot package
See also
Creating charts that help to visualize possible causality
Getting ready
How to do it...
How it works...
See also
Where Does It Belong? Classification
Introduction
Generating error/classification confusion matrices
Getting ready
How to do it...
How it works...
There's more...
Visualizing the error/classification confusion matrix
Comparing the model's performance for different classes
Principal Component Analysis
Getting ready
How to do it...
How it works...
Generating receiver operating characteristic charts
Getting ready
How to do it...
How it works...
There's more...
Using arbitrary class labels
Building, plotting, and evaluating with classification trees
Getting ready
How to do it...
How it works...
There's more...
Computing raw probabilities
Creating the ROC chart
See also
Using random forest models for classification
Getting ready
How to do it...
How it works...
There's more...
Computing raw probabilities
Generating the ROC chart
Specifying cutoffs for classification
See also
Classifying using the support vector machine approach
Getting ready
How to do it...
How it works...
There's more...
Controlling the scaling of variables
Determining the type of SVM model
Assigning weights to the classes
Choosing the cost of SVM
Tuning the SVM
See also
Classifying using the Naive Bayes approach
Getting ready
How to do it...
How it works...
See also
Classifying using the KNN approach
Getting ready
How to do it...
How it works...
There's more...
Automating the process of running KNN for many k values
Selecting appropriate values of k using caret
Using KNN to compute raw probabilities instead of classifications
Using neural networks for classification
Getting ready
How to do it...
How it works...
There's more...
Exercising greater control over nnet
Generating raw probabilities and plotting the ROC curve
Classifying using linear discriminant function analysis
Getting ready
How to do it...
How it works...
There's more...
Using the formula interface for lda
See also
Classifying using logistic regression
Getting ready
How to do it...
How it works...
Text classification for sentiment analysis
Getting ready
How to do it...
How it works...
Give Me a Number - Regression
Introduction
Computing the root-mean-square error
Getting ready
How to do it...
How it works...
There's more...
Using a convenience function to compute the RMS error
Building KNN models for regression
Getting ready
How to do it...
How it works...
There's more...
Running KNN with cross-validation in place of a validation partition
Using a convenience function to run KNN
Using a convenience function to run KNN for multiple k values
See also
Performing linear regression
Getting ready
How to do it...
How it works...
There's more...
Forcing lm to use a specific factor level as the reference
Using other options in the formula expression for linear models
See also
Performing variable selection in linear regression
Getting ready
How to do it...
How it works...
See also
Building regression trees
Getting ready
How to do it...
How it works...
There's more...
Generating regression trees for data with categorical predictors
Generating regression trees using the ensemble method - Bagging and Boosting
See also
Building random forest models for regression
Getting ready
How to do it...
How it works...
There's more...
Controlling forest generation
See also
Using neural networks for regression
Getting ready
How to do it...
How it works...
See also
Performing k-fold cross-validation
Getting ready
How to do it...
How it works...
See also
Performing leave-one-out cross-validation to limit overfitting
How to do it...
How it works...
See also
Can you Simplify That? Data Reduction Techniques
Introduction
Performing cluster analysis using hierarchical clustering
Getting ready
How to do it...
How it works...
There's more...
Cutting trees into clusters
Getting ready
How to do it...
How it works...
Performing cluster analysis using partitioning clustering
Getting ready
How to do it...
How it works...
There's more...
Image segmentation using mini-batch K-means
Getting ready
How to do it...
Partitioning around medoids
Getting ready
How to do it...
How it works...
Clustering large application
Getting ready
How to do it...
How it works...
Performing cluster validation
Getting ready
How to do it...
How it works...
Performing Advance clustering
Density-based spatial clustering of applications with noise
Getting ready
How to do it...
How it works...
Model-based clustering with the EM algorithm
Getting ready
How to do it...
How it works...
Reducing dimensionality with principal component analysis
Getting ready
How to do it...
How it works...
Lessons from History - Time Series Analysis
Introduction
Exploring finance datasets
Getting ready
How to do it...
How it works...
There's more...
Creating and examining date objects
Getting ready
How to do it...
How it works...
Operating on date objects
Getting ready
How to do it...
How it works...
See also
Performing preliminary analyses on time series data
Getting ready
How to do it...
How it works...
See also
Using time series objects
Getting ready
How to do it...
How it works...
See also
Decomposing time series
Getting ready
How to do it...
How it works...
See also
Filtering time series data
Getting ready
How to do it...
How it works...
See also
Smoothing and forecasting using the Holt-Winters method
Getting ready
How to do it...
How it works...
See also
Building an automated ARIMA model
Getting ready
How to do it...
How it works...
See also
How does it look? - Advanced data visualization
Introduction
Creating scatter plots
Getting ready
How to do it...
How it works...
There's more...
Graph using qplot
Creating line graphs
Getting ready
How to do it...
How it works...
Creating bar graphs
Getting ready
How to do it...
Creating bar charts with ggplot2
How it works...
Making distributions plots
Getting ready
How to do it...
How it works...
Creating mosaic graphs
Getting ready
How to do it...
How it works...
Making treemaps
Getting ready
How to do it...
How it works...
Plotting a correlations matrix
Getting ready
How to do it...
How it works...
There's more...
Visualizing a correlation matrix with ggplot2
Creating heatmaps
Getting ready
How to do it...
How it works...
There's more...
Plotting a heatmap over geospatial data
See also
Plotting network graphs
Getting ready
How to do it...
How it works...
See also
Labeling and legends
Getting ready
How to do it...
How it works...
Coloring and themes
Getting ready
How to do it...
How it works...
Creating multivariate plots
Getting ready
How to do it...
How it works...
There's more...
Multivariate plots with the GGally package
Creating 3D graphs and animation
Getting ready
How to do it...
How it works...
There's more...
Adding text to an existing 3D plot
Using a 3D histogram
Using a line graph
Selecting a graphics device
Getting ready
How to do it...
How it works...
This may also interest you - Building Recommendations
Introduction
Building collaborative filtering systems
Getting ready
How to do it...
How it works...
There's more...
Using collaborative filtering on binary data
Performing content-based systems
Getting ready
How to do it...
How it works...
Building hybrid systems
Getting ready
How to do it...
How it works...
Performing similarity measures
Getting ready
How to do it...
How it works...
Application of ML algorithms - image recognition system
Getting ready
How to do it...
How it works...
Evaluating models and optimization
Getting ready
How to do it...
How it works...
There's more...
Identifying a suitable model
Optimizing parameters
A practical example - fraud detection system
Getting ready
How to do it...
How it works...
It's All About Your Connections - Social Network Analysis
Introduction
Downloading social network data using public APIs
Getting ready
How to do it...
How it works...
See also
Creating adjacency matrices and edge lists
Getting ready
How to do it...
How it works...
See also
Plotting social network data
Getting ready
How to do it...
How it works...
There's more...
Specifying plotting preferences
Plotting directed graphs
Creating a graph object with weights
Extracting the network as an adjacency matrix from the graph object
Extracting an adjacency matrix with weights
Extracting an edge list from a graph object
Creating a bipartite network graph
Generating projections of a bipartite network
Computing important network metrics
Getting ready
How to do it...
How it works...
There's more...
Getting edge sequences
Getting immediate and distant neighbors
Adding vertices or nodes
Adding edges
Deleting isolates from a graph
Creating subgraphs
Cluster analysis
Getting ready
How to do it...
How it works...
Force layout
Getting ready
How to do it...
How it works...
There's more...
Force Atlas 2
YiFan Hu layout
Getting ready
How to do it...
How it works...
There's more...
Put Your Best Foot Forward - Document and Present Your Analysis
Introduction
Generating reports of your data analysis with R Markdown and knitr
Getting ready
How to do it...
How it works...
There's more...
Using the render function
Adding output options
Creating interactive web applications with shiny
Getting ready
How to do it...
How it works...
There's more...
Adding images
Adding HTML
Adding tab sets
Adding a dynamic UI
Creating a single-file web application
Dynamic integration of Shiny with knitr
Creating PDF presentations of your analysis with R presentation
Getting ready
How to do it...
How it works...
There's more...
Using hyperlinks
Controlling the display
Enhancing the look of the presentation
Generating dynamic reports
Getting ready
How to do it...
How it works...
Work Smarter, Not Harder - Efficient and Elegant R Code
Introduction
Exploiting vectorized operations
Getting ready
How to do it...
How it works...
There's more...
Processing entire rows or columns using the apply function
Getting ready
How to do it...
How it works...
There's more...
Using apply on a three-dimensional array
Applying a function to all elements of a collection with lapply and sapply
Getting ready
How to do it...
How it works...
There's more...
Dynamic output
One caution
Applying functions to subsets of a vector
Getting ready
How to do it...
How it works...
There's more...
Applying a function on groups from a data frame
Using the split-apply-combine strategy with plyr
Getting ready
How to do it...
How it works...
There's more...
Adding a new column using transform or mutate
Using summarize along with the plyr function
Concatenating the list of data frames into a big data frame
Common grouping functions in plyr
Split-apply-combine with dplyr
Slicing, dicing, and combining data with data tables
Getting ready
How to do it...
How it works...
There's more...
Adding multiple aggregated columns
Counting groups
Deleting a column
Joining data tables
Using symbols
Where in the World? Geospatial Analysis
Introduction
Downloading and plotting a Google map of an area
Getting ready
How to do it...
How it works...
There's more...
Saving the downloaded map as an image file
Getting a satellite image
Overlaying data on the downloaded Google map
Getting ready
How to do it...
How it works...
There's more...
Importing ESRI shape files to R
Getting ready
How to do it...
How it works...
Using the sp package to plot geographic data
Getting ready
How to do it...
How it works...
Getting maps from the maps package
Getting ready
How to do it...
How it works...
Creating spatial data frames from regular data frames containing spatial and other data
Getting ready
How to do it...
How it works...
Creating spatial data frames by combining regular data frames with spatial objects
Getting ready
How to do it...
How it works...
Adding variables to an existing spatial data frame
Getting ready
How to do it...
How it works...
Spatial data analysis with R and QGIS
Getting ready
How to do it...
How it works...
Playing Nice - Connecting to Other Systems
Introduction
Using Java objects in R
Getting ready
How to do it...
How it works...
There's more...
Checking JVM properties
Displaying available methods
Using JRI to call R functions from Java
Getting ready
How to do it...
How it works...
There's more...
Using Rserve to call R functions from Java
Getting ready
How to do it...
How it works...
There's more...
Retrieving an array from R
Executing R scripts from Java
Getting ready
How to do it...
How it works...
Using the xlsx package to connect to Excel
Getting ready
How to do it...
How it works...
Reading data from relational databases - MySQL
Getting ready
How to do it...
Using RODBC
Using RMySQL
Using RJDBC
How it works...
Using RODBC
Using RMySQL
Using RJDBC
There's more...
Fetching all rows
When the SQL query is long
Reading data from NoSQL databases - MongoDB
Getting ready
How to do it...
How it works...
There's more...
Find most severe crime zone
Plotting the crimes on the Chicago map
Working with in-memory data processing with Apache Spark
Getting ready
How to do it...
How it works...
There's more...
Classification with SparkR
Movie lens recommendation system with SparkR
← Prev
Back
Next →
← Prev
Back
Next →