Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Head First Statistics Dedication A Note Regarding Supplemental Files Advance Praise for Head First Statistics Praise for other Head First books Author of Head First Statistics How to use this Book: Intro
Who is this book for?
Who should probably back away from this book?
We know what you’re thinking We know what your brain is thinking Metacognition: thinking about thinking Here’s what WE did
Here’s what YOU can do to bend your brain into submission
Read Me The technical review team Acknowledgments Safari® Books Online
1. Visualizing Information: First Impressions
Statistics are everywhere But why learn statistics? A tale of two charts Manic Mango needs some charts The humble pie chart
So when are pie charts useful?
Chart failure Bar charts can allow for more accuracy Vertical bar charts Horizontal bar charts It’s a matter of scale
Using percentage scales
Using frequency scales Dealing with multiple sets of data
The split-category bar chart The segmented bar chart
Your bar charts rock Categories vs. numbers
Categorical or qualitative data Numerical or quantitative data
Dealing with grouped data To make a histogram, start by finding bar widths Manic Mango needs another chart
A histogram’s bar area must be proportional to frequency
Make the area of histogram bars proportional to frequency Step 1: Find the bar widths Step 2: Find the bar heights Step 3: Draw your chart—a histogram Histograms can’t do everything Introducing cumulative frequency
So what are the cumulative frequencies?
Drawing the cumulative frequency graph Choosing the right chart Manic Mango conquered the games market!
2. Measuring Central Tendency: The Middle Way
Welcome to the Health Club A common measure of average is the mean Mean math
Letters and numbers
Dealing with unknowns Back to the mean
The mean has its own symbol
Handling frequencies Back to the Health Club Everybody was Kung Fu fighting Our data has outliers The butler outliers did it Watercooler conversation Finding the median Business is booming The Little Ducklings swimming class Frequency Magnets Frequency Magnets What went wrong with the mean and median? Introducing the mode
It even works with categorical data
Congratulations!
3. Measuring Variability and Spread: Power Ranges
Wanted: one player We need to compare player scores Use the range to differentiate between data sets
Measuring the range
The problem with outliers We need to get away from outliers Quartiles come to the rescue The interquartile range excludes outliers Quartile anatomy
Finding the position of the lower quartile Finding the position of the upper quartile
We’re not just limited to quartiles So what are percentiles?
Percentile uses Finding percentiles
Box and whisker plots let you visualize ranges Variability is more than just spread Calculating average distances We can calculate variation with the variance... ...but standard deviation is a more intuitive measure
Standard deviation know-how
A quicker calculation for variance What if we need a baseline for comparison? Use standard scores to compare values across data sets
Calculating standard scores
Interpreting standard scores
So what does this tell us about the players?
Statsville All Stars win the league!
4. Calculating Probabilities: Taking Chances
Fat Dan’s Grand Slam Roll up for roulette! Your very own roulette board Place your bets now! What are the chances? Find roulette probabilities You can visualize probabilities with a Venn diagram
Complementary events
It’s time to play! And the winning number is... Let’s bet on an even more likely event You can also add probabilities You win! Time for another bet Exclusive events and intersecting events Problems at the intersection Some more notation Another unlucky spin... ...but it’s time for another bet Conditions apply Find conditional probabilities You can visualize conditional probabilities with a probability tree Trees also help you calculate conditional probabilities Bad luck! We can find P(Black l Even) using the probabilities we already have Step 1: Finding P(Black ∩ Even) So where does this get us? Step 2: Finding P(Even) Step 3: Finding P(Black l Even) These results can be generalized to other problems Use the Law of Total Probability to find P(B) Introducing Bayes’ Theorem We have a winner! It’s time for one last bet If events affect each other, they are dependent If events do not affect each other, they are independent More on calculating probability for independent events Winner! Winner!
5. Using Discrete Probability Distributions: Manage Your Expectations
Back at Fat Dan’s Casino We can compose a probability distribution for the slot machine Expectation gives you a prediction of the results... ... and variance tells you about the spread of the results Variances and probability distributions
So how do we calculate E(X – μ)2?
Let’s calculate the slot machine’s variance Fat Dan changed his prices There’s a linear relationship between E(X) and E(Y) Slot machine transformations General formulas for linear transforms Every pull of the lever is an independent observation Observation shortcuts
Expectation Variance
New slot machine on the block Add E(X) and E(Y) to get E(X + Y)... ... and subtract E(X) and E(Y) to get E(X – Y) You can also add and subtract linear transformations
Adding aX and bY Subtracting aX and bY
Jackpot!
6. Permutations and Combinations: Making Arrangements
The Statsville Derby It’s a three-horse race How many ways can they cross the finish line? Calculate the number of arrangements
So what if there are n horses?
Going round in circles It’s time for the novelty race Arranging by individuals is different than arranging by type We need to arrange animals by type Generalize a formula for arranging duplicates It’s time for the twenty-horse race How many ways can we fill the top three positions? Examining permutations What if horse order doesn’t matter Examining combinations It’s the end of the race
7. Geometric, Binomial, and Poisson Distributions: Keeping Things Discrete
Meet Chad, the hapless snowboarder We need to find Chad’s probability distribution There’s a pattern to this probability distribution The probability distribution can be represented algebraically The pattern of expectations for the geometric distribution Expectation is 1/p Finding the variance for our distribution You’ve mastered the geometric distribution Should you play, or walk away? Generalizing the probability for three questions
What’s the missing number?
Let’s generalize the probability further What’s the expectation and variance?
Let’s look at one trial
Binomial expectation and variance The Statsville Cinema has a problem
It’s a different sort of distribution So how do we find probabilities?
Expectation and variance for the Poisson distribution
What does the Poisson distribution look like?
So what’s the probability distribution? Combine Poisson variables The Poisson in disguise Anyone for popcorn?
8. Using the Normal Distribution: Being Normal
Discrete data takes exact values... ... but not all numeric data is discrete What’s the delay? We need a probability distribution for continuous data Probability density functions can be used for continuous data Probability = area To calculate probability, start by finding f(x)... ... then find probability by finding the area We’ve found the probability Searching for a soul sole mate Male modelling The normal distribution is an “ideal” model for continuous data So how do we find normal probabilities? Three steps to calculating normal probabilities Step 1: Determine your distribution Step 2: Standardize to N(0, 1) To standardize, first move the mean... ... then squash the width Now find Z for the specific value you want to find probability for Step 3: Look up the probability in your handy table
So how do you use probability tables?
Julie’s probability is in the table And they all lived happily ever after
But it doesn’t stop there.
9. Using the Normal Distribution ii: Beyond Normal
Love is a roller coaster All aboard the Love Train Normal bride + normal groom It’s still just weight How’s the combined weight distributed? Finding probabilities More people want the Love Train Linear transforms describe underlying changes in values...
So what’s the distribution of a linear transform?
...and independent observations describe how many values you have Expectation and variance for independent observations Should we play, or walk away? Normal distribution to the rescue When to approximate the binomial distribution with the normal
Finding the mean and variance
Revisiting the normal approximation The binomial is discrete, but the normal is continuous Apply a continuity correction before calculating the approximation All aboard the Love Train When to approximate the binomial distribution with the normal
When λ is small... When λ is large... So how large is large enough?
A runaway success!
10. Using Statistical Sampling: Taking Samples
The Mighty Gumball taste test They’re running out of gumballs Test a gumball sample, not the whole gumball population
Gumball populations Gumball samples
How sampling works When sampling goes wrong How to design a sample
Define your target population Define your sampling units
Define your sampling frame Sometimes samples can be biased
Unbiased Samples Biased Samples
Sources of bias How to choose your sample Simple random sampling
Sampling with replacement Sampling without replacement
How to choose a simple random sample
Drawing lots Random number generators
There are other types of sampling We can use stratified sampling... ...or we can use cluster sampling... ...or even systematic sampling Mighty Gumball has a sample
So what’s next?
11. Estimating Populations and Samples: Making Predictions
So how long does flavor really last for? Let’s start by estimating the population mean Point estimators can approximate population parameters Let’s estimate the population variance We need a different point estimator than sample variance
So what is the estimator?
Which formula’s which? Mighty Gumball has done more sampling It’s a question of proportion
Predicting population proportion
Buy your gumballs here!
Introducing new jumbo boxes
So how does this relate to sampling? The sampling distribution of proportions So what’s the expectation of Ps? And what’s the variance of Ps? Find the distribution of Ps Ps follows a normal distribution
Ps—continuity correction required
How many gumballs?
There’s just one more problem...
We need probabilities for the sample mean The sampling distribution of the mean Find the expectation for X̄ What about the the variance of X̄? So how is X̄ distributed? If n is large, X̄ can still be approximated by the normal distribution
Introducing the Central Limit Theorem
Using the central limit theorem
The binomial distribution The Poisson distribution Finding probabilities
Sampling saves the day!
You’ve made a lot of progress
12. Constructing Confidence Intervals: Guessing with Confidence
Mighty Gumball is in trouble
They need you to save them
The problem with precision Introducing confidence intervals Four steps for finding confidence intervals Step 1: Choose your population statistic Step 2: Find its sampling distribution Point estimators to the rescue We’ve found the distribution for X̄ Step 3: Decide on the level of confidence How to select an appropriate confidence level Step 4: Find the confidence limits Start by finding Z Rewrite the inequality in terms of μ Finally, find the value of X̄ You’ve found the confidence interval Let’s summarize the steps Handy shortcuts for confidence intervals
What’s the interval in general?
Just one more problem... Step 1: Choose your population statistic Step 2: Find its sampling distribution X̄ follows the t-distribution when the sample is small Find the standard score for the t-distribution Step 3: Decide on the level of confidence Step 4: Find the confidence limits Using t-distribution probability tables The t-distribution vs. the normal distribution You’ve found the confidence intervals!
13. Using Hypothesis Tests: Look At The Evidence
Statsville’s new miracle drug So what’s the problem? Resolving the conflict from 50,000 feet The six steps for hypothesis testing Step 1: Decide on the hypothesis
The drug company’s claim So what’s the null hypothesis for SnoreCull?
So what’s the alternative?
The doctor’s perspective The alternate hypothesis for SnoreCull
Step 2: Choose your test statistic
What’s the test statistic for SnoreCull?
Step 3: Determine the critical region
At what point can we reject the drug company claims?
To find the critical region, first decide on the significance level
So what significance level should we use?
Step 4: Find the p-value
How do we find the p-value?
We’ve found the p-value Step 5: Is the sample result in the critical region? Step 6: Make your decision So what did we just do? What if the sample size is larger? Let’s conduct another hypothesis test Step 1: Decide on the hypotheses
It’s still the same problem
Step 2: Choose the test statistic Use the normal to approximate the binomial in our test statistic Step 3: Find the critical region SnoreCull failed the test Mistakes can happen Let’s start with Type I errors
So what’s the probability of getting a Type I error?
What about Type II errors?
So how do we find β?
Finding errors for SnoreCull
Let’s start with the Type I error So what about the Type II error?
We need to find the range of values Find P(Type II error) Introducing power
So what’s the power of SnoreCull?
The doctor’s happy
But it doesn’t stop there
14. The χ2 Distribution: There’s Something Going On...
There may be trouble ahead at Fat Dan’s Casino Let’s start with the slot machines The χ2 test assesses difference So what does the test statistic represent? Two main uses of the χ2 distribution
When v is 1 or 2 When v is greater than 2
v represents degrees of freedom
So what’s v?
What’s the significance?
How to use χ2 probability tables
Hypothesis testing with χ2 You’ve solved the slot machine mystery Fat Dan has another problem the χ2 distribution can test for independence You can find the expected frequencies using probability So what are the frequencies?
How do we find the frequencies in general?
We still need to calculate degrees of freedom Generalizing the degrees of freedom And the formula is... You’ve saved the casino
15. Correlation and Regression: What’s My Line?
Never trust the weather Let’s analyze sunshine and attendance Exploring types of data
All about bivariate data
Visualizing bivariate data Scatter diagrams show you patterns Correlation vs. causation
We need to predict the concert attendance
Predict values with a line of best fit Your best guess is still a guess
We need to find the equation of the line
We need to minimize the errors Introducing the sum of squared errors Find the equation for the line of best fit
Let’s start with b
Finding the slope for the line of best fit
We use x̄ and ȳ to help us find b
Finding the slope for the line of best fit, part ii We’ve found b, but what about a? You’ve made the connection Let’s look at some correlations
Accurate linear correlation No linear correlation
The correlation coefficient measures how well the line fits the data There’s a formula for calculating the correlation coefficient, r Find r for the concert data Find r for the concert data, continued You’ve saved the day! Leaving town... It’s been great having you here in Statsville!
A. Leftovers: The Top Ten Things (we didn’t cover)
#1. Other ways of presenting data
Dotplots Stemplots
#2. Distribution anatomy
The empirical rule for normal distributions Chebyshev’s rule for any distribution
#3. Experiments
So what makes for a good experiment?
Designing your experiment
Completely randomized design Randomized block design Matched pairs design
#4. Least square regression alternate notation #5. The coefficient of determination
Calculating r2
#6. Non-linear relationships #7. The confidence interval for the slope of a regression line
The margin of error for b
#8. Sampling distributions – the difference between two means #9. Sampling distributions – the difference between two proportions #10. E(X) and Var(X) for continuous probability distributions Finding E(X) Finding Var(X)
B. Statistics Tables: Looking Things Up
#1. Standard normal probabilities #2. t-distribution critical values #3. X2 critical values
Index About the Author Copyright
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion