Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Data Analysis with R Credits About the Author About the Reviewer www.PacktPub.com Support files, eBooks, discount offers, and more Why subscribe? Free access for Packt account holders Preface What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support Downloading the example code Downloading the color images of this book Errata Piracy Questions 1. RefresheR Navigating the basics Arithmetic and assignment Logicals and characters Flow of control Getting help in R Vectors Subsetting Vectorized functions Advanced subsetting Recycling Functions Matrices Loading data into R Working with packages Exercises Summary 2. The Shape of Data Univariate data Frequency distributions Central tendency Spread Populations, samples, and estimation Probability distributions Visualization methods Exercises Summary 3. Describing Relationships Multivariate data Relationships between a categorical and a continuous variable Relationships between two categorical variables The relationship between two continuous variables Covariance Correlation coefficients Comparing multiple correlations Visualization methods Categorical and continuous variables Two categorical variables Two continuous variables More than two continuous variables Exercises Summary 4. Probability Basic probability A tale of two interpretations Sampling from distributions Parameters The binomial distribution The normal distribution The three-sigma rule and using z-tables Exercises Summary 5. Using Data to Reason About the World Estimating means The sampling distribution Interval estimation How did we get 1.96? Smaller samples Exercises Summary 6. Testing Hypotheses Null Hypothesis Significance Testing One and two-tailed tests When things go wrong A warning about significance A warning about p-values Testing the mean of one sample Assumptions of the one sample t-test Testing two means Don't be fooled! Assumptions of the independent samples t-test Testing more than two means Assumptions of ANOVA Testing independence of proportions What if my assumptions are unfounded? Exercises Summary 7. Bayesian Methods The big idea behind Bayesian analysis Choosing a prior Who cares about coin flips Enter MCMC – stage left Using JAGS and runjags Fitting distributions the Bayesian way The Bayesian independent samples t-test Exercises Summary 8. Predicting Continuous Variables Linear models Simple linear regression Simple linear regression with a binary predictor A word of warning Multiple regression Regression with a non-binary predictor Kitchen sink regression The bias-variance trade-off Cross-validation Striking a balance Linear regression diagnostics Second Anscombe relationship Third Anscombe relationship Fourth Anscombe relationship Advanced topics Exercises Summary 9. Predicting Categorical Variables k-Nearest Neighbors Using k-NN in R Confusion matrices Limitations of k-NN Logistic regression Using logistic regression in R Decision trees Random forests Choosing a classifier The vertical decision boundary The diagonal decision boundary The crescent decision boundary The circular decision boundary Exercises Summary 10. Sources of Data Relational Databases Why didn't we just do that in SQL? Using JSON XML Other data formats Online repositories Exercises Summary 11. Dealing with Messy Data Analysis with missing data Visualizing missing data Types of missing data So which one is it? Unsophisticated methods for dealing with missing data Complete case analysis Pairwise deletion Mean substitution Hot deck imputation Regression imputation Stochastic regression imputation Multiple imputation So how does mice come up with the imputed values? Methods of imputation Multiple imputation in practice Analysis with unsanitized data Checking for out-of-bounds data Checking the data type of a column Checking for unexpected categories Checking for outliers, entry errors, or unlikely data points Chaining assertions Other messiness OpenRefine Regular expressions tidyr Exercises Summary 12. Dealing with Large Data Wait to optimize Using a bigger and faster machine Be smart about your code Allocation of memory Vectorization Using optimized packages Using another R implementation Use parallelization Getting started with parallel R An example of (some) substance Using Rcpp Be smarter about your code Exercises Summary 13. Reproducibility and Best Practices R Scripting RStudio Running R scripts An example script Scripting and reproducibility R projects Version control Communicating results Exercises Summary Index
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion