[Head First Series 01] • Head First Statistics by Griffiths, Dawn -- Read -- Imperial Library of Trantor

Index

Head First Statistics Dedication A Note Regarding Supplemental Files Advance Praise for Head First Statistics Praise for other Head First books Author of Head First Statistics How to use this Book: Intro

Who is this book for?

Who should probably back away from this book?

We know what you’re thinking We know what your brain is thinking Metacognition: thinking about thinking Here’s what WE did

Here’s what YOU can do to bend your brain into submission

Read Me The technical review team Acknowledgments Safari® Books Online

1. Visualizing Information: First Impressions

Statistics are everywhere But why learn statistics? A tale of two charts Manic Mango needs some charts The humble pie chart

So when are pie charts useful?

Chart failure Bar charts can allow for more accuracy Vertical bar charts Horizontal bar charts It’s a matter of scale

Using percentage scales

Using frequency scales Dealing with multiple sets of data

The split-category bar chart The segmented bar chart

Your bar charts rock Categories vs. numbers

Categorical or qualitative data Numerical or quantitative data

Dealing with grouped data To make a histogram, start by finding bar widths Manic Mango needs another chart

A histogram’s bar area must be proportional to frequency

Make the area of histogram bars proportional to frequency Step 1: Find the bar widths Step 2: Find the bar heights Step 3: Draw your chart—a histogram Histograms can’t do everything Introducing cumulative frequency

So what are the cumulative frequencies?

Drawing the cumulative frequency graph Choosing the right chart Manic Mango conquered the games market!

2. Measuring Central Tendency: The Middle Way

Welcome to the Health Club A common measure of average is the mean Mean math

Letters and numbers

Dealing with unknowns Back to the mean

The mean has its own symbol

Handling frequencies Back to the Health Club Everybody was Kung Fu fighting Our data has outliers The butler outliers did it Watercooler conversation Finding the median Business is booming The Little Ducklings swimming class Frequency Magnets Frequency Magnets What went wrong with the mean and median? Introducing the mode

It even works with categorical data

Congratulations!

3. Measuring Variability and Spread: Power Ranges

Wanted: one player We need to compare player scores Use the range to differentiate between data sets

Measuring the range

The problem with outliers We need to get away from outliers Quartiles come to the rescue The interquartile range excludes outliers Quartile anatomy

Finding the position of the lower quartile Finding the position of the upper quartile

We’re not just limited to quartiles So what are percentiles?

Percentile uses Finding percentiles

Box and whisker plots let you visualize ranges Variability is more than just spread Calculating average distances We can calculate variation with the variance... ...but standard deviation is a more intuitive measure

Standard deviation know-how

A quicker calculation for variance What if we need a baseline for comparison? Use standard scores to compare values across data sets

Calculating standard scores

Interpreting standard scores

So what does this tell us about the players?

Statsville All Stars win the league!

4. Calculating Probabilities: Taking Chances

Fat Dan’s Grand Slam Roll up for roulette! Your very own roulette board Place your bets now! What are the chances? Find roulette probabilities You can visualize probabilities with a Venn diagram

Complementary events

It’s time to play! And the winning number is... Let’s bet on an even more likely event You can also add probabilities You win! Time for another bet Exclusive events and intersecting events Problems at the intersection Some more notation Another unlucky spin... ...but it’s time for another bet Conditions apply Find conditional probabilities You can visualize conditional probabilities with a probability tree Trees also help you calculate conditional probabilities Bad luck! We can find P(Black l Even) using the probabilities we already have Step 1: Finding P(Black ∩ Even) So where does this get us? Step 2: Finding P(Even) Step 3: Finding P(Black l Even) These results can be generalized to other problems Use the Law of Total Probability to find P(B) Introducing Bayes’ Theorem We have a winner! It’s time for one last bet If events affect each other, they are dependent If events do not affect each other, they are independent More on calculating probability for independent events Winner! Winner!

5. Using Discrete Probability Distributions: Manage Your Expectations

Back at Fat Dan’s Casino We can compose a probability distribution for the slot machine Expectation gives you a prediction of the results... ... and variance tells you about the spread of the results Variances and probability distributions

So how do we calculate E(X – μ)2?

Let’s calculate the slot machine’s variance Fat Dan changed his prices There’s a linear relationship between E(X) and E(Y) Slot machine transformations General formulas for linear transforms Every pull of the lever is an independent observation Observation shortcuts

Expectation Variance

New slot machine on the block Add E(X) and E(Y) to get E(X + Y)... ... and subtract E(X) and E(Y) to get E(X – Y) You can also add and subtract linear transformations

Adding aX and bY Subtracting aX and bY

Jackpot!

6. Permutations and Combinations: Making Arrangements

The Statsville Derby It’s a three-horse race How many ways can they cross the finish line? Calculate the number of arrangements

So what if there are n horses?

Going round in circles It’s time for the novelty race Arranging by individuals is different than arranging by type We need to arrange animals by type Generalize a formula for arranging duplicates It’s time for the twenty-horse race How many ways can we fill the top three positions? Examining permutations What if horse order doesn’t matter Examining combinations It’s the end of the race

7. Geometric, Binomial, and Poisson Distributions: Keeping Things Discrete

Meet Chad, the hapless snowboarder We need to find Chad’s probability distribution There’s a pattern to this probability distribution The probability distribution can be represented algebraically The pattern of expectations for the geometric distribution Expectation is 1/p Finding the variance for our distribution You’ve mastered the geometric distribution Should you play, or walk away? Generalizing the probability for three questions

What’s the missing number?

Let’s generalize the probability further What’s the expectation and variance?

Let’s look at one trial

Binomial expectation and variance The Statsville Cinema has a problem

It’s a different sort of distribution So how do we find probabilities?

Expectation and variance for the Poisson distribution

What does the Poisson distribution look like?

So what’s the probability distribution? Combine Poisson variables The Poisson in disguise Anyone for popcorn?

8. Using the Normal Distribution: Being Normal

Discrete data takes exact values... ... but not all numeric data is discrete What’s the delay? We need a probability distribution for continuous data Probability density functions can be used for continuous data Probability = area To calculate probability, start by finding f(x)... ... then find probability by finding the area We’ve found the probability Searching for a soul sole mate Male modelling The normal distribution is an “ideal” model for continuous data So how do we find normal probabilities? Three steps to calculating normal probabilities Step 1: Determine your distribution Step 2: Standardize to N(0, 1) To standardize, first move the mean... ... then squash the width Now find Z for the specific value you want to find probability for Step 3: Look up the probability in your handy table

So how do you use probability tables?

Julie’s probability is in the table And they all lived happily ever after

But it doesn’t stop there.

9. Using the Normal Distribution ii: Beyond Normal

Love is a roller coaster All aboard the Love Train Normal bride + normal groom It’s still just weight How’s the combined weight distributed? Finding probabilities More people want the Love Train Linear transforms describe underlying changes in values...

So what’s the distribution of a linear transform?

...and independent observations describe how many values you have Expectation and variance for independent observations Should we play, or walk away? Normal distribution to the rescue When to approximate the binomial distribution with the normal

Finding the mean and variance

Revisiting the normal approximation The binomial is discrete, but the normal is continuous Apply a continuity correction before calculating the approximation All aboard the Love Train When to approximate the binomial distribution with the normal

When λ is small... When λ is large... So how large is large enough?

A runaway success!

10. Using Statistical Sampling: Taking Samples

The Mighty Gumball taste test They’re running out of gumballs Test a gumball sample, not the whole gumball population

Gumball populations Gumball samples

How sampling works When sampling goes wrong How to design a sample

Define your target population Define your sampling units

Define your sampling frame Sometimes samples can be biased

Unbiased Samples Biased Samples

Sources of bias How to choose your sample Simple random sampling

Sampling with replacement Sampling without replacement

How to choose a simple random sample

Drawing lots Random number generators

There are other types of sampling We can use stratified sampling... ...or we can use cluster sampling... ...or even systematic sampling Mighty Gumball has a sample

So what’s next?

11. Estimating Populations and Samples: Making Predictions

So how long does flavor really last for? Let’s start by estimating the population mean Point estimators can approximate population parameters Let’s estimate the population variance We need a different point estimator than sample variance

So what is the estimator?

Which formula’s which? Mighty Gumball has done more sampling It’s a question of proportion

Predicting population proportion

Buy your gumballs here!

Introducing new jumbo boxes

So how does this relate to sampling? The sampling distribution of proportions So what’s the expectation of Ps? And what’s the variance of Ps? Find the distribution of Ps Ps follows a normal distribution

Ps—continuity correction required

How many gumballs?

There’s just one more problem...

We need probabilities for the sample mean The sampling distribution of the mean Find the expectation for X̄ What about the the variance of X̄? So how is X̄ distributed? If n is large, X̄ can still be approximated by the normal distribution

Introducing the Central Limit Theorem

Using the central limit theorem

The binomial distribution The Poisson distribution Finding probabilities

Sampling saves the day!

You’ve made a lot of progress

12. Constructing Confidence Intervals: Guessing with Confidence

Mighty Gumball is in trouble

They need you to save them

The problem with precision Introducing confidence intervals Four steps for finding confidence intervals Step 1: Choose your population statistic Step 2: Find its sampling distribution Point estimators to the rescue We’ve found the distribution for X̄ Step 3: Decide on the level of confidence How to select an appropriate confidence level Step 4: Find the confidence limits Start by finding Z Rewrite the inequality in terms of μ Finally, find the value of X̄ You’ve found the confidence interval Let’s summarize the steps Handy shortcuts for confidence intervals

What’s the interval in general?

Just one more problem... Step 1: Choose your population statistic Step 2: Find its sampling distribution X̄ follows the t-distribution when the sample is small Find the standard score for the t-distribution Step 3: Decide on the level of confidence Step 4: Find the confidence limits Using t-distribution probability tables The t-distribution vs. the normal distribution You’ve found the confidence intervals!

13. Using Hypothesis Tests: Look At The Evidence

Statsville’s new miracle drug So what’s the problem? Resolving the conflict from 50,000 feet The six steps for hypothesis testing Step 1: Decide on the hypothesis

The drug company’s claim So what’s the null hypothesis for SnoreCull?

So what’s the alternative?

The doctor’s perspective The alternate hypothesis for SnoreCull

Step 2: Choose your test statistic

What’s the test statistic for SnoreCull?

Step 3: Determine the critical region

At what point can we reject the drug company claims?

To find the critical region, first decide on the significance level

So what significance level should we use?

Step 4: Find the p-value

How do we find the p-value?

We’ve found the p-value Step 5: Is the sample result in the critical region? Step 6: Make your decision So what did we just do? What if the sample size is larger? Let’s conduct another hypothesis test Step 1: Decide on the hypotheses

It’s still the same problem

Step 2: Choose the test statistic Use the normal to approximate the binomial in our test statistic Step 3: Find the critical region SnoreCull failed the test Mistakes can happen Let’s start with Type I errors

So what’s the probability of getting a Type I error?

What about Type II errors?

So how do we find β?

Finding errors for SnoreCull

Let’s start with the Type I error So what about the Type II error?

We need to find the range of values Find P(Type II error) Introducing power

So what’s the power of SnoreCull?

The doctor’s happy

But it doesn’t stop there

14. The χ2 Distribution: There’s Something Going On...

There may be trouble ahead at Fat Dan’s Casino Let’s start with the slot machines The χ2 test assesses difference So what does the test statistic represent? Two main uses of the χ2 distribution

When v is 1 or 2 When v is greater than 2

v represents degrees of freedom

So what’s v?

What’s the significance?

How to use χ2 probability tables

Hypothesis testing with χ2 You’ve solved the slot machine mystery Fat Dan has another problem the χ2 distribution can test for independence You can find the expected frequencies using probability So what are the frequencies?

How do we find the frequencies in general?

We still need to calculate degrees of freedom Generalizing the degrees of freedom And the formula is... You’ve saved the casino

15. Correlation and Regression: What’s My Line?

Never trust the weather Let’s analyze sunshine and attendance Exploring types of data

All about bivariate data

Visualizing bivariate data Scatter diagrams show you patterns Correlation vs. causation

We need to predict the concert attendance

Predict values with a line of best fit Your best guess is still a guess

We need to find the equation of the line

We need to minimize the errors Introducing the sum of squared errors Find the equation for the line of best fit

Let’s start with b

Finding the slope for the line of best fit

We use x̄ and ȳ to help us find b

Finding the slope for the line of best fit, part ii We’ve found b, but what about a? You’ve made the connection Let’s look at some correlations

Accurate linear correlation No linear correlation

The correlation coefficient measures how well the line fits the data There’s a formula for calculating the correlation coefficient, r Find r for the concert data Find r for the concert data, continued You’ve saved the day! Leaving town... It’s been great having you here in Statsville!

A. Leftovers: The Top Ten Things (we didn’t cover)

#1. Other ways of presenting data

Dotplots Stemplots

#2. Distribution anatomy

The empirical rule for normal distributions Chebyshev’s rule for any distribution

#3. Experiments

So what makes for a good experiment?

Designing your experiment

Completely randomized design Randomized block design Matched pairs design

#4. Least square regression alternate notation #5. The coefficient of determination

Calculating r2

#6. Non-linear relationships #7. The confidence interval for the slope of a regression line

The margin of error for b

#8. Sampling distributions – the difference between two means #9. Sampling distributions – the difference between two proportions #10. E(X) and Var(X) for continuous probability distributions Finding E(X) Finding Var(X)

B. Statistics Tables: Looking Things Up

#1. Standard normal probabilities #2. t-distribution critical values #3. X2 critical values

Index About the Author Copyright

← Prev
Back
Next →

← Prev
Back
Next →