Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Title Page Copyright
Statistics for Data Science
Credits About the Author About the Reviewer www.PacktPub.com
Why subscribe?
Customer Feedback Preface
What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support
Downloading the example code Downloading the color images of this book Errata Piracy Questions
Transitioning from Data Developer to Data Scientist
Data developer thinking Objectives of a data developer
Querying or mining
Data quality or data cleansing Data modeling Issue or insights Thought process
Developer versus scientist
New data, new source Quality questions Querying and mining Performance Financial reporting Visualizing Tools of the trade
Advantages of thinking like a data scientist
Developing a better approach to understanding data Using statistical thinking during program or database designing Adding to your personal toolbox Increased marketability Perpetual learning Seeing the future
Transitioning to a data scientist
Let's move ahead
Summary
Declaring the Objectives
Key objectives of data science
Collecting data Processing data Exploring and visualizing data Analyzing the data and/or applying machine learning to the data Deciding (or planning) based upon acquired insight
Thinking like a data scientist Bringing statistics into data science Common terminology
Statistical population Probability False positives Statistical inference Regression Fitting Categorical data Classification Clustering Statistical comparison Coding Distributions Data mining Decision trees Machine learning Munging and wrangling Visualization D3 Regularization Assessment Cross-validation Neural networks Boosting Lift Mode Outlier Predictive modeling Big Data Confidence interval Writing
Summary
A Developer's Approach to Data Cleaning
Understanding basic data cleaning
Common data issues Contextual data issues Cleaning techniques
R and common data issues
Outliers
Step 1 – Profiling the data Step 2 – Addressing the outliers
Domain expertise Validity checking Enhancing data Harmonization Standardization
Transformations Deductive correction Deterministic imputation Summary
Data Mining and the Database Developer
Data mining
Common techniques Visualization
Cluster analysis Correlation analysis Discriminant analysis Factor analysis Regression analysis Logistic analysis Purpose
Mining versus querying
Choosing R for data mining Visualizations
Current smokers
Missing values A cluster analysis
Dimensional reduction
Calculating statistical significance
Frequent patterning
Frequent item-setting
Sequence mining Summary
Statistical Analysis for the Database Developer
Data analysis
Looking closer
Statistical analysis Summarization
Comparing groups
Samples Group comparison conclusions
Summarization modeling
Establishing the nature of data Successful statistical analysis R and statistical analysis Summary
Database Progression to Database Regression
Introducing statistical regression
Techniques and approaches for regression
Choosing your technique Does it fit?
Identifying opportunities for statistical regression
Summarizing data Exploring relationships Testing significance of differences
Project profitability R and statistical regression A working example
Establishing the data profile
The graphical analysis
Predicting with our linear model
Step 1: Chunking the data Step 2: Creating the model on the training data Step 3: Predicting the projected profit on test data Step 4: Reviewing the model Step 4: Accuracy and error
Summary
Regularization for Database Improvement
Statistical regularization
Various statistical regularization methods Ridge Lasso Least angles Opportunities for regularization
Collinearity Sparse solutions High-dimensional data Classification
Using data to understand statistical regularization Improving data or a data model
Simplification Relevance Speed Transformation Variation of coefficients Casual inference Back to regularization Reliability
Using R for statistical regularization
Parameter Setup
Summary
Database Development and Assessment
Assessment and statistical assessment
Objectives Baselines Planning for assessment Evaluation
Development versus assessment
Planning
Data assessment and data quality assurance
Categorizing quality Relevance Cross-validation
Preparing data
R and statistical assessment
Questions to ask Learning curves
Example of a learning curve
Summary
Databases and Neural Networks
Ask any data scientist
Defining neural network
Nodes Layers Training Solution Understanding the concepts
Neural network models and database models
No single or main node Not serial No memory address to store results
R-based neural networks
References Data prep and preprocessing Data splitting Model parameters Cross-validation R packages for ANN development ANN ANN2 NNET Black boxes
A use case
Popular use cases Character recognition Image compression Stock market prediction Fraud detection Neuroscience
Summary
Boosting your Database
Definition and purpose
Bias
Categorizing bias Causes of bias Bias data collection Bias sample selection
Variance
ANOVA
Noise
Noisy data
Weak and strong learners
Weak to strong Model bias Training and prediction time Complexity Which way?
Back to boosting
How it started AdaBoost
What you can learn from boosting (to help) your database
Using R to illustrate boosting methods
Prepping the data Training Ready for boosting
Example results
Summary
Database Classification using Support Vector Machines
Database classification
Data classification in statistics Guidelines for classifying data
Common guidelines
Definitions
Definition and purpose of an SVM
The trick Feature space and cheap computations Drawing the line More than classification Downside Reference resources Predicting credit scores
Using R and an SVM to classify data in a database
Moving on
Summary
Database Structures and Machine Learning
Data structures and data models
Data structures Data models
What's the difference? Relationships
Machine learning
Overview of machine learning concepts Key elements of machine learning
Representation Evaluation Optimization
Types of machine learning
Supervised learning Unsupervised learning Semi-supervised learning Reinforcement learning
Most popular
Applications of machine learning Machine learning in practice
Understanding Preparation Learning Interpretation Deployment Iteration
Using R to apply machine learning techniques to a database
Understanding the data Preparing Data developer Understanding the challenge Cross-tabbing and plotting
Summary
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion