Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Principles of Data Science
Table of Contents Principles of Data Science Credits About the Author About the Reviewers www.PacktPub.com
eBooks, discount offers, and more
Why subscribe?
Preface
What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support
Downloading the example code Downloading the color images of this book Errata Piracy Questions
1. How to Sound Like a Data Scientist
What is data science?
Basic terminology Why data science? Example – Sigma Technologies
The data science Venn diagram
The math
Example – spawner-recruit models
Computer programming Why Python?
Python practices Example of basic Python
Example – parsing a single tweet Domain knowledge
Some more terminology Data science case studies
Case study – automating government paper pushing
Fire all humans, right?
Case study – marketing dollars Case study – what's in a job description?
Summary
2. Types of Data
Flavors of data Why look at these distinctions? Structured versus unstructured data
Example of data preprocessing
Word/phrase counts Presence of certain special characters Relative length of text Picking out topics
Quantitative versus qualitative data
Example – coffee shop data Example – world alcohol consumption data Digging deeper
The road thus far… The four levels of data
The nominal level
Mathematical operations allowed Measures of center What data is like at the nominal level
The ordinal level
Examples Mathematical operations allowed Measures of center Quick recap and check
The interval level
Example Mathematical operations allowed Measures of center Measures of variation
Standard deviation
The ratio level
Examples Measures of center Problems with the ratio level
Data is in the eye of the beholder Summary
3. The Five Steps of Data Science
Introduction to data science Overview of the five steps
Ask an interesting question Obtain the data Explore the data Model the data Communicate and visualize the results
Explore the data
Basic questions for data exploration Dataset 1 – Yelp
Dataframes Series Exploration tips for qualitative data
Nominal level columns Filtering in Pandas Ordinal level columns
Dataset 2 – titanic
Summary
4. Basic Mathematics
Mathematics as a discipline Basic symbols and terminology
Vectors and matrices
Quick exercises Answers
Arithmetic symbols
Summation Proportional Dot product
Graphs Logarithms/exponents Set theory
Linear algebra
Matrix multiplication
How to multiply matrices
Summary
5. Impossible or Improbable – A Gentle Introduction to Probability
Basic definitions Probability Bayesian versus Frequentist
Frequentist approach
The law of large numbers
Compound events Conditional probability The rules of probability
The addition rule Mutual exclusivity The multiplication rule Independence Complementary events
A bit deeper Summary
6. Advanced Probability
Collectively exhaustive events Bayesian ideas revisited
Bayes theorem More applications of Bayes theorem
Example – Titanic Example – medical studies
Random variables
Discrete random variables
Types of discrete random variables
Binomial random variables Poisson random variable, Continuous random variables
Summary
7. Basic Statistics
What are statistics? How do we obtain and sample data?
Obtaining data
Observational Experimental
Sampling data
Probability sampling Random sampling Unequal probability sampling
How do we measure statistics?
Measures of center Measures of variation
Definition Example – employee salaries
Measures of relative standing
The insightful part – correlations in data
The Empirical rule Summary
8. Advanced Statistics
Point estimates Sampling distributions Confidence intervals Hypothesis tests
Conducting a hypothesis test One sample t-tests
Example of a one sample t-tests Assumptions of the one sample t-tests
Type I and type II errors Hypothesis test for categorical variables
Chi-square goodness of fit test
Assumptions of the chi-square goodness of fit test Example of a chi-square test for goodness of fit
Chi-square test for association/independence Assumptions of the chi-square independence test
Summary
9. Communicating Data
Why does communication matter? Identifying effective and ineffective visualizations
Scatter plots Line graphs Bar charts Histograms Box plots
When graphs and statistics lie
Correlation versus causation Simpson's paradox If correlation doesn't imply causation, then what does?
Verbal communication
It's about telling a story On the more formal side of things
The why/how/what strategy of presenting Summary
10. How to Tell If Your Toaster Is Learning – Machine Learning Essentials
What is machine learning? Machine learning isn't perfect How does machine learning work? Types of machine learning
Supervised learning
It's not only about predictions Types of supervised learning
Regression Classification
Data is in the eyes of the beholder
Unsupervised learning
Reinforcement learning Overview of the types of machine learning
How does statistical modeling fit into all of this? Linear regression
Adding more predictors Regression metrics
Logistic regression Probability, odds, and log odds
The math of logistic regression
Dummy variables Summary
11. Predictions Don't Grow on Trees – or Do They?
Naïve Bayes classification Decision trees
How does a computer build a regression tree? How does a computer fit a classification tree?
Unsupervised learning
When to use unsupervised learning
K-means clustering
Illustrative example – data points Illustrative example – beer!
Choosing an optimal number for K and cluster validation
The Silhouette Coefficient
Feature extraction and principal component analysis Summary
12. Beyond the Essentials
The bias variance tradeoff
Error due to bias Error due to variance Two extreme cases of bias/variance tradeoff
Underfitting Overfitting
How bias/variance play into error functions
K folds cross-validation Grid searching
Visualizing training error versus cross-validation error
Ensembling techniques
Random forests Comparing Random forests with decision trees
Neural networks
Basic structure
Summary
13. Case Studies
Case study 1 – predicting stock prices based on social media
Text sentiment analysis Exploratory data analysis
Regression route Classification route
Going beyond with this example
Case study 2 – why do some people cheat on their spouses? Case study 3 – using tensorflow
Tensorflow and neural networks
Summary
Index
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion