Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Title Page Copyright
R Data Mining
Credits About the Author About the Reviewers www.PacktPub.com
Why subscribe?
Customer Feedback Preface
What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support
Downloading the example code Downloading the color images of this book Errata Piracy Questions
Why to Choose R for Your Data Mining and Where to Start
What is R? A bit of history R's points of strength
Open source inside Plugin ready Data visualization friendly
Installing R and writing R code
Downloading R
R installation for Windows and macOS R installation for Linux OS
Main components of a base R installation
Possible alternatives to write and run R code
RStudio (all OSs) The Jupyter Notebook (all OSs) Visual Studio (Windows users only)
R foundational notions
A preliminary R session
Executing R interactively through the R console Creating an R script Executing an R script
Vectors Lists
Creating lists Subsetting lists
Data frames Functions
R's weaknesses and how to overcome them
Learning R effectively and minimizing the effort
The tidyverse Leveraging the R community to learn R
Where to find the R community Engaging with the community to learn R
Handling large datasets with R
Further references Summary
A First Primer on Data Mining Analysing Your Bank Account Data
Acquiring and preparing your banking data
Data model
Summarizing your data with pivot-like tables
A gentle introduction to the pipe operator An even more gentle introduction to the dplyr package Installing the necessary packages and loading your data into R
Installing and loading the necessary packages Importing your data into R
Defining the monthly and daily sum of expenses
Visualizing your data with ggplot2
Basic data visualization principles
Less but better  Not every chart is good for your message
Scatter plot Line chart Bar plot Other advanced charts
Colors have to be chosen carefully
A bit of theory - chromatic circle, hue, and luminosity
Visualizing your data with ggplot
One more gentle introduction – the grammar of graphics A layered grammar of graphics – ggplot2 Visualizing your banking movements with ggplot2
Visualizing the number of movements per day of the week
Further references Summary
The Data Mining Process - CRISP-DM Methodology
The Crisp-DM methodology data mining cycle  Business understanding Data understanding
Data collection
How to perform data collection with R
Data import from TXT and CSV files Data import from different types of format already structured as tables Data import from unstructured sources
Data description
How to perform data description with R
Data exploration
What to use in R to perform this task
The summary() function Box plot Histograms
Data preparation Modelling
Defining a data modeling strategy
How similar problems were solved in the past
Emerging techniques
Classification of modeling problems  How to perform data modeling with R
Evaluation
Clustering evaluation Classification evaluation Regression evaluation How to judge the adequacy of a model's performance
What to use in R to perform this task
Deployment
Deployment plan development Maintenance plan development
Summary
Keeping the House Clean – The Data Mining Architecture
A general overview Data sources
Types of data sources
Unstructured data sources Structured data sources Key issues of data sources
Databases and data warehouses
The third wheel – the data mart One-level database Two-level database Three-level database Technologies
SQL MongoDB Hadoop
The data mining engine
The interpreter The interface between the engine and the data warehouse The data mining algorithms
User interface
Clarity
Clarity and mystery Clarity and simplicity Efficiency Consistency
Syntax highlight Auto-completion  
How to build a data mining architecture in R
Data sources The data warehouse The data mining engine
The interface between the engine and the data warehouse The data mining algorithms
The user interface
Further references Summary
How to Address a Data Mining Problem – Data Cleaning and Validation
On a quiet day Data cleaning
Tidy data Analysing the structure of our data
The str function The describe function head, tail, and View functions Evaluating your data tidiness
Every row is a record Every column shows an attribute Every table represents an observational unit
Tidying our data
The tidyr package
Long versus wide data The spread function The gather function The separate function
Applying tidyr to our dataset
Validating our data
Fitness for use Conformance to standards Data quality controls
Consistency checks Data type checks Logical checks Domain checks Uniqueness checks
Performing data validation on our data
Data type checks with str() Domain checks
The final touch — data merging
left_join function moving beyond left_join
Further references Summary
Looking into Your Data Eyes – Exploratory Data Analysis
Introducing summary EDA
Describing the population distribution
Quartiles and Median Mean
The mean and phenomenon going on within sub populations The mean being biased by outlier values Computing the mean of our population
Variance Standard deviation Skewness
Measuring the relationship between variables
Correlation
The Pearson correlation coefficient Distance correlation
Weaknesses of summary EDA - the Anscombe quartet
Graphical EDA
Visualizing a variable distribution
Histogram
Reporting date histogram Geographical area histogram Cash flow histogram
Boxplot Checking for outliers
Visualizing relationships between variables
Scatterplots
Adding title, subtitle, and caption to the plot Setting axis and legend Adding explicative text to the plot Final touches on colors
Further references Summary
Our First Guess – a Linear Regression
Defining a data modelling strategy
Data modelling notions
Supervised learning Unsupervised learning The modeling strategy
Applying linear regression to our data
The intuition behind linear regression The math behind the linear regression
Ordinary least squares technique Model requirements – what to look for before applying the model
Residuals' uncorrelation Residuals' homoscedasticity
How to apply linear regression in R
Fitting the linear regression model Validating model assumption Visualizing fitted values
Preparing the data for visualization Developing the data visualization
Further references Summary
A Gentle Introduction to Model Performance Evaluation
Defining model performance
Fitting versus interpretability Making predictions with models
Measuring performance in regression models
Mean squared error R-squared
R-squared meaning and interpretation R-squared computation in R Adjusted R-squared R-squared misconceptions
The R-squared doesn't measure the goodness of fit A low R-squared doesn't mean your model is not statistically significant
Measuring the performance in classification problems
The confusion matrix
Confusion matrix in R
Accuracy
How to compute accuracy in R
Sensitivity
How to compute sensitivity in R
Specificity
How to compute specificity in R
How to choose the right performance statistics
A final general warning – training versus test datasets Further references Summary
Don't Give up – Power up Your Regression Including Multiple Variables
Moving from simple to multiple linear regression
Notation Assumptions
Variables' collinearity
Tolerance  Variance inflation factors Addressing collinearity
Dimensionality reduction
Stepwise regression
Backward stepwise regression
From the full model to the n-1 model
Forward stepwise regression Double direction stepwise regression
Principal component regression
Fitting a multiple linear model with R
Model fitting Variable assumptions validation Residual assumptions validation Dimensionality reduction
Principal component regression Stepwise regression
Linear model cheat sheet
Further references Summary
A Different Outlook to Problems with Classification Models
What is classification and why do we need it?
Linear regression limitations for categorical variables Common classification algorithms and models
Logistic regression
The intuition behind logistic regression
The logistic function estimates a response variable enclosed within an upper and lower bound The logistic function estimates the probability of an observation pertaining to one of the two available categories
The math behind logistic regression
Maximum likelihood estimator Model assumptions
Absence of multicollinearity between variables Linear relationship between explanatory variables and log odds Large enough sample size
How to apply logistic regression in R
Fitting the model
Reading the glm() estimation output The level of statistical significance of the association between the explanatory variable and the response variable The AIC performance metric
Validating model assumptions
Fitting quadratic and cubic models to test for linearity of log odds
Visualizing and interpreting logistic regression results 
Visualizing results Interpreting results
Logistic regression cheat sheet
Support vector machines
The intuition behind support vector machines
The hyperplane Maximal margin classifier Support vector and support vector machines Model assumptions Independent and identically distributed random variables
Independent variables Identically distributed
Applying support vector machines in R
The svm() function Applying the svm function to our data
Interpreting support vector machine results
Understanding the meaning of hyperplane weights
Support Vector Machine cheat sheet
References Summary
The Final Clash – Random Forests and Ensemble Learning
Random forest
Random forest building blocks – decision trees introduction The intuition behind random forests How to apply random forests in R Evaluating the results of the model
Performance of the model
OOB estimate error rate Confusion matrix
Importance of predictors
Mean decrease in accuracy Gini index Plotting relative importance of predictors Random forest cheat sheet
Ensemble learning
Basic ensemble learning techniques  Applying ensemble learning to our data in R
The R caret package Computing a confusion matrix with the caret package Interpreting confusion matrix results Applying a weighted majority vote to our data
Applying estimated models on new data
predict.glm() for prediction from the logistic model predict.randomForest() for prediction from random forests predict.svm() for prediction from support vector machines
A more structured approach to predictive analytics Applying the majority vote ensemble technique on predicted data Further references Summary
Looking for the Culprit – Text Data Mining with R
Extracting data from a PDF file in R
Getting a list of documents in a folder Reading PDF files into R via pdf_text() Iteratively extracting text from a set of documents with a for loop
Sentiment analysis Developing wordclouds from text Looking for context in text – analyzing document n-grams Performing network analysis on textual data
Obtaining an hedge list from a data frame  Visualizing a network with the ggraph package
Tuning the appearance of nodes and edges Computing the degree parameter in a network to highlight relevant nodes
Further references Summary
Sharing Your Stories with Your Stakeholders through R Markdown
Principles of a good data mining report
Clearly state the objectives Clearly state assumptions  Make the data treatments clear Show consistent data Provide data lineage
Set up an rmarkdown report Develop an R markdown report in RStudio
A brief introduction to markdown  Inserting a chunk of code
How to show readable tables in rmarkdwon reports
Reproducing R code output within text through inline code Introduction to Shiny and the reactivity framework
Employing input and output to deal with changes in Shiny app parameters
Adding an interactive data lineage module
Adding an input panel to an R markdown report Adding a data table to your report Expanding Shiny beyond the basics
Rendering and sharing an R markdown report 
Rendering an R markdown report Sharing an R Markdown report
Render a static markdown report into different file formats Render interactive Shiny apps on dedicated servers
Sharing a Shiny app through shinyapps.io
Further references Summary
Epilogue Dealing with Dates, Relative Paths and Functions
Dealing with dates in R Working directories and relative paths in R Conditional statements
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion