Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Head First Statistics
Dedication
A Note Regarding Supplemental Files
Advance Praise for Head First Statistics
Praise for other Head First books
Author of Head First Statistics
How to use this Book: Intro
Who is this book for?
Who should probably back away from this book?
We know what you’re thinking
We know what your brain is thinking
Metacognition: thinking about thinking
Here’s what WE did
Here’s what YOU can do to bend your brain into submission
Read Me
The technical review team
Acknowledgments
Safari® Books Online
1. Visualizing Information: First Impressions
Statistics are everywhere
But why learn statistics?
A tale of two charts
Manic Mango needs some charts
The humble pie chart
So when are pie charts useful?
Chart failure
Bar charts can allow for more accuracy
Vertical bar charts
Horizontal bar charts
It’s a matter of scale
Using percentage scales
Using frequency scales
Dealing with multiple sets of data
The split-category bar chart
The segmented bar chart
Your bar charts rock
Categories vs. numbers
Categorical or qualitative data
Numerical or quantitative data
Dealing with grouped data
To make a histogram, start by finding bar widths
Manic Mango needs another chart
A histogram’s bar area must be proportional to frequency
Make the area of histogram bars proportional to frequency
Step 1: Find the bar widths
Step 2: Find the bar heights
Step 3: Draw your chart—a histogram
Histograms can’t do everything
Introducing cumulative frequency
So what are the cumulative frequencies?
Drawing the cumulative frequency graph
Choosing the right chart
Manic Mango conquered the games market!
2. Measuring Central Tendency: The Middle Way
Welcome to the Health Club
A common measure of average is the mean
Mean math
Letters and numbers
Dealing with unknowns
Back to the mean
The mean has its own symbol
Handling frequencies
Back to the Health Club
Everybody was Kung Fu fighting
Our data has outliers
The butler outliers did it
Watercooler conversation
Finding the median
Business is booming
The Little Ducklings swimming class
Frequency Magnets
Frequency Magnets
What went wrong with the mean and median?
Introducing the mode
It even works with categorical data
Congratulations!
3. Measuring Variability and Spread: Power Ranges
Wanted: one player
We need to compare player scores
Use the range to differentiate between data sets
Measuring the range
The problem with outliers
We need to get away from outliers
Quartiles come to the rescue
The interquartile range excludes outliers
Quartile anatomy
Finding the position of the lower quartile
Finding the position of the upper quartile
We’re not just limited to quartiles
So what are percentiles?
Percentile uses
Finding percentiles
Box and whisker plots let you visualize ranges
Variability is more than just spread
Calculating average distances
We can calculate variation with the variance...
...but standard deviation is a more intuitive measure
Standard deviation know-how
A quicker calculation for variance
What if we need a baseline for comparison?
Use standard scores to compare values across data sets
Calculating standard scores
Interpreting standard scores
So what does this tell us about the players?
Statsville All Stars win the league!
4. Calculating Probabilities: Taking Chances
Fat Dan’s Grand Slam
Roll up for roulette!
Your very own roulette board
Place your bets now!
What are the chances?
Find roulette probabilities
You can visualize probabilities with a Venn diagram
Complementary events
It’s time to play!
And the winning number is...
Let’s bet on an even more likely event
You can also add probabilities
You win!
Time for another bet
Exclusive events and intersecting events
Problems at the intersection
Some more notation
Another unlucky spin...
...but it’s time for another bet
Conditions apply
Find conditional probabilities
You can visualize conditional probabilities with a probability tree
Trees also help you calculate conditional probabilities
Bad luck!
We can find P(Black l Even) using the probabilities we already have
Step 1: Finding P(Black ∩ Even)
So where does this get us?
Step 2: Finding P(Even)
Step 3: Finding P(Black l Even)
These results can be generalized to other problems
Use the Law of Total Probability to find P(B)
Introducing Bayes’ Theorem
We have a winner!
It’s time for one last bet
If events affect each other, they are dependent
If events do not affect each other, they are independent
More on calculating probability for independent events
Winner! Winner!
5. Using Discrete Probability Distributions: Manage Your Expectations
Back at Fat Dan’s Casino
We can compose a probability distribution for the slot machine
Expectation gives you a prediction of the results...
... and variance tells you about the spread of the results
Variances and probability distributions
So how do we calculate E(X – μ)2?
Let’s calculate the slot machine’s variance
Fat Dan changed his prices
There’s a linear relationship between E(X) and E(Y)
Slot machine transformations
General formulas for linear transforms
Every pull of the lever is an independent observation
Observation shortcuts
Expectation
Variance
New slot machine on the block
Add E(X) and E(Y) to get E(X + Y)...
... and subtract E(X) and E(Y) to get E(X – Y)
You can also add and subtract linear transformations
Adding aX and bY
Subtracting aX and bY
Jackpot!
6. Permutations and Combinations: Making Arrangements
The Statsville Derby
It’s a three-horse race
How many ways can they cross the finish line?
Calculate the number of arrangements
So what if there are n horses?
Going round in circles
It’s time for the novelty race
Arranging by individuals is different than arranging by type
We need to arrange animals by type
Generalize a formula for arranging duplicates
It’s time for the twenty-horse race
How many ways can we fill the top three positions?
Examining permutations
What if horse order doesn’t matter
Examining combinations
It’s the end of the race
7. Geometric, Binomial, and Poisson Distributions: Keeping Things Discrete
Meet Chad, the hapless snowboarder
We need to find Chad’s probability distribution
There’s a pattern to this probability distribution
The probability distribution can be represented algebraically
The pattern of expectations for the geometric distribution
Expectation is 1/p
Finding the variance for our distribution
You’ve mastered the geometric distribution
Should you play, or walk away?
Generalizing the probability for three questions
What’s the missing number?
Let’s generalize the probability further
What’s the expectation and variance?
Let’s look at one trial
Binomial expectation and variance
The Statsville Cinema has a problem
It’s a different sort of distribution
So how do we find probabilities?
Expectation and variance for the Poisson distribution
What does the Poisson distribution look like?
So what’s the probability distribution?
Combine Poisson variables
The Poisson in disguise
Anyone for popcorn?
8. Using the Normal Distribution: Being Normal
Discrete data takes exact values...
... but not all numeric data is discrete
What’s the delay?
We need a probability distribution for continuous data
Probability density functions can be used for continuous data
Probability = area
To calculate probability, start by finding f(x)...
... then find probability by finding the area
We’ve found the probability
Searching for a soul sole mate
Male modelling
The normal distribution is an “ideal” model for continuous data
So how do we find normal probabilities?
Three steps to calculating normal probabilities
Step 1: Determine your distribution
Step 2: Standardize to N(0, 1)
To standardize, first move the mean...
... then squash the width
Now find Z for the specific value you want to find probability for
Step 3: Look up the probability in your handy table
So how do you use probability tables?
Julie’s probability is in the table
And they all lived happily ever after
But it doesn’t stop there.
9. Using the Normal Distribution ii: Beyond Normal
Love is a roller coaster
All aboard the Love Train
Normal bride + normal groom
It’s still just weight
How’s the combined weight distributed?
Finding probabilities
More people want the Love Train
Linear transforms describe underlying changes in values...
So what’s the distribution of a linear transform?
...and independent observations describe how many values you have
Expectation and variance for independent observations
Should we play, or walk away?
Normal distribution to the rescue
When to approximate the binomial distribution with the normal
Finding the mean and variance
Revisiting the normal approximation
The binomial is discrete, but the normal is continuous
Apply a continuity correction before calculating the approximation
All aboard the Love Train
When to approximate the binomial distribution with the normal
When λ is small...
When λ is large...
So how large is large enough?
A runaway success!
10. Using Statistical Sampling: Taking Samples
The Mighty Gumball taste test
They’re running out of gumballs
Test a gumball sample, not the whole gumball population
Gumball populations
Gumball samples
How sampling works
When sampling goes wrong
How to design a sample
Define your target population
Define your sampling units
Define your sampling frame
Sometimes samples can be biased
Unbiased Samples
Biased Samples
Sources of bias
How to choose your sample
Simple random sampling
Sampling with replacement
Sampling without replacement
How to choose a simple random sample
Drawing lots
Random number generators
There are other types of sampling
We can use stratified sampling...
...or we can use cluster sampling...
...or even systematic sampling
Mighty Gumball has a sample
So what’s next?
11. Estimating Populations and Samples: Making Predictions
So how long does flavor really last for?
Let’s start by estimating the population mean
Point estimators can approximate population parameters
Let’s estimate the population variance
We need a different point estimator than sample variance
So what is the estimator?
Which formula’s which?
Mighty Gumball has done more sampling
It’s a question of proportion
Predicting population proportion
Buy your gumballs here!
Introducing new jumbo boxes
So how does this relate to sampling?
The sampling distribution of proportions
So what’s the expectation of Ps?
And what’s the variance of Ps?
Find the distribution of Ps
Ps follows a normal distribution
Ps—continuity correction required
How many gumballs?
There’s just one more problem...
We need probabilities for the sample mean
The sampling distribution of the mean
Find the expectation for X̄
What about the the variance of X̄?
So how is X̄ distributed?
If n is large, X̄ can still be approximated by the normal distribution
Introducing the Central Limit Theorem
Using the central limit theorem
The binomial distribution
The Poisson distribution
Finding probabilities
Sampling saves the day!
You’ve made a lot of progress
12. Constructing Confidence Intervals: Guessing with Confidence
Mighty Gumball is in trouble
They need you to save them
The problem with precision
Introducing confidence intervals
Four steps for finding confidence intervals
Step 1: Choose your population statistic
Step 2: Find its sampling distribution
Point estimators to the rescue
We’ve found the distribution for X̄
Step 3: Decide on the level of confidence
How to select an appropriate confidence level
Step 4: Find the confidence limits
Start by finding Z
Rewrite the inequality in terms of μ
Finally, find the value of X̄
You’ve found the confidence interval
Let’s summarize the steps
Handy shortcuts for confidence intervals
What’s the interval in general?
Just one more problem...
Step 1: Choose your population statistic
Step 2: Find its sampling distribution
X̄ follows the t-distribution when the sample is small
Find the standard score for the t-distribution
Step 3: Decide on the level of confidence
Step 4: Find the confidence limits
Using t-distribution probability tables
The t-distribution vs. the normal distribution
You’ve found the confidence intervals!
13. Using Hypothesis Tests: Look At The Evidence
Statsville’s new miracle drug
So what’s the problem?
Resolving the conflict from 50,000 feet
The six steps for hypothesis testing
Step 1: Decide on the hypothesis
The drug company’s claim
So what’s the null hypothesis for SnoreCull?
So what’s the alternative?
The doctor’s perspective
The alternate hypothesis for SnoreCull
Step 2: Choose your test statistic
What’s the test statistic for SnoreCull?
Step 3: Determine the critical region
At what point can we reject the drug company claims?
To find the critical region, first decide on the significance level
So what significance level should we use?
Step 4: Find the p-value
How do we find the p-value?
We’ve found the p-value
Step 5: Is the sample result in the critical region?
Step 6: Make your decision
So what did we just do?
What if the sample size is larger?
Let’s conduct another hypothesis test
Step 1: Decide on the hypotheses
It’s still the same problem
Step 2: Choose the test statistic
Use the normal to approximate the binomial in our test statistic
Step 3: Find the critical region
SnoreCull failed the test
Mistakes can happen
Let’s start with Type I errors
So what’s the probability of getting a Type I error?
What about Type II errors?
So how do we find β?
Finding errors for SnoreCull
Let’s start with the Type I error
So what about the Type II error?
We need to find the range of values
Find P(Type II error)
Introducing power
So what’s the power of SnoreCull?
The doctor’s happy
But it doesn’t stop there
14. The χ2 Distribution: There’s Something Going On...
There may be trouble ahead at Fat Dan’s Casino
Let’s start with the slot machines
The χ2 test assesses difference
So what does the test statistic represent?
Two main uses of the χ2 distribution
When v is 1 or 2
When v is greater than 2
v represents degrees of freedom
So what’s v?
What’s the significance?
How to use χ2 probability tables
Hypothesis testing with χ2
You’ve solved the slot machine mystery
Fat Dan has another problem
the χ2 distribution can test for independence
You can find the expected frequencies using probability
So what are the frequencies?
How do we find the frequencies in general?
We still need to calculate degrees of freedom
Generalizing the degrees of freedom
And the formula is...
You’ve saved the casino
15. Correlation and Regression: What’s My Line?
Never trust the weather
Let’s analyze sunshine and attendance
Exploring types of data
All about bivariate data
Visualizing bivariate data
Scatter diagrams show you patterns
Correlation vs. causation
We need to predict the concert attendance
Predict values with a line of best fit
Your best guess is still a guess
We need to find the equation of the line
We need to minimize the errors
Introducing the sum of squared errors
Find the equation for the line of best fit
Let’s start with b
Finding the slope for the line of best fit
We use x̄ and ȳ to help us find b
Finding the slope for the line of best fit, part ii
We’ve found b, but what about a?
You’ve made the connection
Let’s look at some correlations
Accurate linear correlation
No linear correlation
The correlation coefficient measures how well the line fits the data
There’s a formula for calculating the correlation coefficient, r
Find r for the concert data
Find r for the concert data, continued
You’ve saved the day!
Leaving town...
It’s been great having you here in Statsville!
A. Leftovers: The Top Ten Things (we didn’t cover)
#1. Other ways of presenting data
Dotplots
Stemplots
#2. Distribution anatomy
The empirical rule for normal distributions
Chebyshev’s rule for any distribution
#3. Experiments
So what makes for a good experiment?
Designing your experiment
Completely randomized design
Randomized block design
Matched pairs design
#4. Least square regression alternate notation
#5. The coefficient of determination
Calculating r2
#6. Non-linear relationships
#7. The confidence interval for the slope of a regression line
The margin of error for b
#8. Sampling distributions – the difference between two means
#9. Sampling distributions – the difference between two proportions
#10. E(X) and Var(X) for continuous probability distributions
Finding E(X)
Finding Var(X)
B. Statistics Tables: Looking Things Up
#1. Standard normal probabilities
#2. t-distribution critical values
#3. X2 critical values
Index
About the Author
Copyright
← Prev
Back
Next →
← Prev
Back
Next →