Statsville’s new miracle drug

Statsville’s leading drug company has produced a new remedy for curing snoring. Frustrated snorers are flocking to their doctors in hopes of finding nightly relief.

The drug company claims that their miracle drug cures 90% of people within two weeks, which is great news for the people with snoring difficulties. The trouble is, not everyone’s convinced.

The doctor at the Statsville Surgery has been prescribing SnoreCull to her patients, but she’s disappointed by the results. She decides to conduct her own trial of the drug.

She takes a random sample of 15 snorers and puts them on a course of SnoreCull for two weeks. After two weeks, she calls them back in to see whether their snoring has stopped.

Here are the results:

Cured?	Yes	No
Frequency	11	4

Note

All the doctor records is whether or not the patients snoring has been cured.

So what’s the problem?

Here’s the probability distribution for how many people the drug company says should have been cured by the snoring remedy.

The number of people cured by SnoreCull in the doctor’s sample is actually much lower than you’d expect it to be. Given the claims made by the drug company, you’d expect 14 people to be cured, but instead, only 11 people have been.

So why the discrepancy?

The drug company might not be deliberately telling lies, but their claims might be misleading.

It’s possible that the tests of the drug company were flawed, and this might have resulted in misleading claims being made about SnoreCull. They may have inadvertent conducted flawed or biased tests on SnoreCull, which resulted in them making inaccurate predictions about the population.

If the success rate of SnoreCull is actually lower than 90%, this would explain why only 11 people in the sample were cured.

The drug company’s claims might actually be accurate.

Rather than the drug company being at fault, it’s always possible that the patients in the doctor’s sample may not have been representative of the snoring population as a whole. It’s always possible that the snoring remedy does cure 90% of snorers, but the doctor just happens to have a higher proportion of people in her sample whom it doesn’t cure. In other words, her sample might be biased in some way, or it could just come down to there being a small number of patients in the sample.

Brain Power

How do you think we can resolve this? How can we determine whether to trust the claims of the drug company, or accept the doctor’s doubts instead?

Resolving the conflict from 50,000 feet

So how do we resolve the conflict between the doctor and the drug company? Let’s take a very high level view of what we need to do.

We can resolve the conflict between the drug company and the doctor by putting the claims of the drug company on trial. In other words, we’ll accept the word of the drug company by default, but if there’s strong evidence against it, we’ll side with the doctor instead.

Here’s what we’ll do:

Examine the claim

Note

Take the claim of the drug company.

Examine the evidence

Note

See how much evidence we need to reject the drug company’s claim, and check this against the evidence we have. We do this by looking at how rare the doctors results would be if the drug company is correct.

Make a decision

Note

Depending on the evidence, accept or reject the claims of the drug company.

In general, this process is called hypothesis testing, as you take a hypothesis or claim and then test it against the evidence. Let’s look at the general process for this.

The six steps for hypothesis testing

Here are the broad steps that are involved in hypothesis testing. We’ll go through each one in detail in the following pages.

Decide on the hypothesis you’re going to test
Note
This is the claim that we’re putting on trial
Choose your test statistic
Note
We need to pick the statistic that best tests the claim.
Determine the critical region for your decision
Note
We need a certain level of certainty.
Find the p-value of the test statistic
Note
We need to see how rare our results are, assuming the claims are true.
See whether the sample result is within the critical region
Note
We then see if it’s within our bounds of certainty.
Make your decision

We need to make sure we properly test the drug claim before we reject it.

That way we’ll know we’re making an impartial decision either way, and we’ll be giving the claim a fair trial. What we don’t want to to do is reject the claim if there’s insufficient evidence against it, and this means that we need some way of deciding what constitutes sufficient evidence.

Step 1: Decide on the hypothesis

Let’s start with step one of the hypothesis test, and look at the key claim we want to test. This claim is called a hypothesis.

The drug company’s claim

According to the drug company, SnoreCull cures 90% of patients within 2 weeks. We need to accept this position unless there is sufficiently strong evidence to the contrary.

The claim that we’re testing is called the null hypothesis. It’s represented by H₀, and it’s the claim that we’ll accept unless there is strong evidence against it.

So what’s the null hypothesis for SnoreCull?

The null hypothesis for SnoreCull is the claim of the drug company: that it cures 90% of patients. This is the claim that we’re going to go along with, unless we find strong evidence against it.

We need to test whether at least 90% of patients are cured by the drug, so this means that the null hypothesis is that p = 90%.

H₀: p = 0.9

Note

This is the null hypothesis for the SnoreCull trial.

So what’s the alternative?

We’ve looked at what the claim is we’re going to test, the null hypothesis, but what if it’s not true? What’s the alternative?

The doctor’s perspective

The doctor’s view is that the claims of the drug company are too good to be true. She doesn’t think that as many as 90% of patients are cured. She thinks it’s far more likely that the cure rate is actually less than 90%.

The counterclaim to the null hypothesis is called the alternate hypothesis. It’s represented by H₁, and it’s the claim that we’ll accept if there’s strong enough evidence to reject H₀.

The alternate hypothesis for SnoreCull

The alternate hypothesis for SnoreCull is the claim you’ll accept if the drug company’s claim turns out to be false. If there’s sufficiently strong evidence against the drug company, then it’s likely that the doctor is right.

The doctor believes that SnoreCull cures less than 90% of people, so this means that the alternate hypothesis is that p < 90%.

H₁: p < 0.9

Note

This is the alternate hypothesis for the SnoreCull trial

Now that we have the null and alternate hypotheses for the SnoreCull hypothesis test, we can move onto step 2.

When hypothesis testing, you assume the null hypothesis is true. If there’s sufficient evidence against it, you reject it and accept the alternate hypothesis.

Step 2: Choose your test statistic

Now that you’ve determined exactly what it is you’re going to test, you need some means of testing it. You can do this with a test statistic.

The test statistic is the statistic that you use to test your hypothesis. It’s the statistic that’s most relevant to the test.

What’s the test statistic for SnoreCull?

In our hypothesis test, we want to test whether SnoreCull cures 90% of people or more. To test this, we can look at the probability distribution according to the drug company, and see whether the number of successes in the sample is significant.

If we use X to represent the number of people cured in the sample, this means that we can use X as our test statistic. There are 15 people in the sample, and the probability of success according to the drug company is 0.9. As X follows a binomial distribution, this means that the test statistic is actually:

We choose the test statistic according to H₀, the null hypothesis.

We need to test whether there is sufficient evidence against the null hypothesis, and we do this by first assuming that H₀ is true. We then look for evidence that contradicts H₀. For the SnoreCull hypothesis test, we assume that the probability of success is 0.9 unless there is strong evidence against this being true.

To do this, we look at how likely it is for us to get the results we did, assuming the probability of success is 0.9. In other words, we take the results of the sample and examine the probability of getting that result. We do this by finding a critical region.

Step 3: Determine the critical region

The critical region of a hypothesis test is the set of values that present the most extreme evidence against the null hypothesis.

Let’s see how this works by taking another look at the doctor’s sample. If 90% or more people had been cured, this would have been in line with the claims made by the drug company. As the number of people cured decreases, the more unlikely it becomes that the claims of the drug company are true.

Here’s the probability distribution:

At what point can we reject the drug company claims?

The fewer people there are in the sample who are successfully cured by SnoreCull, the stronger the evidence there is against the claims of the drug company. The question is, at what point does the evidence become so strong that we confidently reject the null hypothesis? At what point can we reject the claim that SnoreCull cures 90% of snorers?

What we need is some way of indicating at what point we can reasonably reject the null hypothesis, and we can do this by specifying a critical region. If the number of snorers cured falls within the critical region, then we’ll say there is sufficient evidence to reject the null hypothesis. If the number of snorers cured falls outside the critical region, then we’ll accept that there isn’t sufficient evidence to reject the null hypothesis, and we’ll accept the claims of the drug company. We’ll call the cut off point for the critical region c, the critical value.

So how do we choose the critical region?

To find the critical region, first decide on the significance level

Before we can find the critical region of the hypothesis test, we first need to decide on the significance level. The significance level of a test is a measure of how unlikely you want the results of the sample to be before you reject the null hypothesis H_o. Just like the confidence level for a confidence interval, the significance level is given as a percentage.

As an example, suppose we want to test the claims of the drug company at a 5% level of significance. This means that we choose the critical region so that the probability of fewer than c snorers being cured is less than 0.05. It’s the lowest 5% of the probability distribution.

The significance level is normally represented by the Greek letter α. The lower α is, the more unlikely the results in your sample need to be before we reject H_o.

So what significance level should we use?

Let’s use a significance level of 5% in our hypothesis test. This means that if the number of snorers cured in the sample is in the lowest 5% of the probability distribution, then we will reject the claims of the drug company. If the number of snorers cured lies in the top 95% of the probability distribution, then we’ll decide there isn’t enough evidence to reject the null hypothesis, and accept the claims of the drug company.

If we use X to represent the number of snorers cured, then we define the critical region as being values such that

Vital Statistics: Significance level

The significance level is represented by α. It’s a way of saying how unlikely you want your results to be before you’ll reject H₀.

P(X < c) < α

where

α = 5%

Step 4: Find the p-value

Now that we’ve looked at critical regions, we can move on to step 4, finding the p-value.

A p-value is the probability of getting a value up to and including the one in your sample in the direction of your critical region. It’s a way of taking your sample and working out whether the result falls within the critical region for your hypothesis test. In other words, we use the p-value to say whether or not we can reject the null hypothesis.

How do we find the p-value?

How we find the p-value depends on our critical region and our test statistic. For the SnoreCull test, 11 people were cured, and our critical region is the lower tail of the distribution. This means that our p-value is P(X ≤ 11), where X is the distribution for the number of people cured in the sample.

As the significance level of our test is 5%, this means that if P(X ≤ 11) is less than 0.05, then the value 11 falls within the critical region, and we can reject the null hypothesis.

We’ve found the p-value

To find the p-value of our hypothesis test, we had to find P(X ≤ 11). This means that the p-value is 0.0555.

A p-value is the probability of getting the results in the sample, or something more extreme, in the direction of the critical region.

In our hypothesis test for SnoreCull, the critical region is the lower tail of the probability distribution. In order to see whether 11 people being cured of snoring is in the critical region, we calculated P(X ≤ 11), as this is the probability of getting a result at least as extreme as the results of our sample in the direction of the lower tail.

Had our critical region been the upper tail of the probability distribution instead, we would have needed to find P(X ≥ 11). We would have counted more extreme results as being greater than 11, as these would have been closer to the critical region.

Step 5: Is the sample result in the critical region?

Now that we’ve found the p-value, we can use it to see whether the result from our sample falls within the critical region. If it does, then we’ll have sufficient evidence to reject the claims of the drug company.

Our critical region is the lower tail of the probability distribution, and we’re using a significance level of 5%. This means that we can reject the null hypothesis if our p-value is less that 0.05. As our p-value is 0.0555, this means that the number of people cured by SnoreCull in the sample doesn’t fall within the critical region.

Step 6: Make your decision

We’ve now reached the final step of the hypothesis test. We can decide whether to accept the null hypothesis, or reject it in favor of the alternative.

The p-value of the hypothesis test falls just outside the critical region of the test. This means that there isn’t sufficient evidence to reject the null hypothesis. In other words:

We accept the claims of the drug company

So what did we just do?

Let’s summarize what we just did.

First of all, we took the claims of the drug company, which the doctor had misgivings about. We used these claims as the basis of a hypothesis test. We formed a null hypothesis that the probability of curing a patient is 0.9, and then we applied this to the number of people in the doctors sample.

We then decided to conduct a test at the 5% level, using the success rate in the doctor’s sample. We looked at the probability of 11 people or fewer being cured, and checked to see whether the probability of this was less than 5%, or 0.05. In other words, we looked at the probability of getting a result this extreme, or even more so.

Finally, we found that at the 5% level, there wasn’t strong enough evidence to reject the claims of the drug company.

Once you’ve fixed the significance level of the test, you can’t change it.

The test needs to be completely impartial. This means that you decide what level you need the test to be at, based on what level of evidence you require, before you look at what evidence you actually have.

If you were to look at the amount of evidence you have before deciding on the level of the test, this could influence any decisions you made. You might be tempted to decide on a specific level of test just to get the result you want. This would make the outcome of the test biased, and you might make the wrong decision.

What if the sample size is larger?

So far the doctor has conducted her trial using a sample of just 15 people, and on the basis of this, there was insufficient evidence to reject the claims of the drug company.

It’s possible that the size of the sample wasn’t large enough to get an accurate result. The doctor might get more reliable results by using a larger sample.

Here are the results from the doctor’s new trial:

Cured?	Yes	No
Frequency	80	20

We want to determine whether the new data will make a difference in the outcome of the test.

Let’s run through another hypothesis test, this time with the larger sample.

Brain Power

What’s the null hypothesis of this new problem? What’s the alternate hypothesis?

Let’s conduct another hypothesis test

The doctor still has misgivings about the claims made by the drug company. Let’s conduct a hypothesis test based on the new data.

Step 1: Decide on the hypotheses

We need to start off by finding the null hypothesis and alternate hypothesis of the SnoreCull trial. As a reminder, the null hypothesis is the claim that we’re testing, and the alternate hypothesis is what we’ll accept if there’s sufficient evidence against the null hypothesis.

So what are the null and alternate hypotheses?

It’s still the same problem

For the last test, we took the claims made by the drug company and used these as the basis for the null hypothesis. We’re testing the same claims, so the null hypothesis is still the same. We have

H₀: p = 0.9

The alternate hypothesis is the same too. If there is strong evidence against the claims made by the drug company, then we’ll accept that the drug cures fewer than 90% of the patients. This gives us an alternate hypothesis of:

H₁: p < 0.9

Step 2: Choose the test statistic

As before, the next step is to choose the test statistic. In other words, we need some statistic that we can use to test the hypothesis.

For the previous hypothesis test, we conducted the test by looking at the number of successes in the sample and seeing how significant the result was. We used the binomial distribution to find the probability of getting a result at least as extreme as the value we got in the sample. In other words, we used a test statistic of X ~ B(15, 0.9) to test whether P(X ≤ 11) was less than 0.05, the level of significance.

This time the number of people in the sample is 100, and we’re testing the same claim, that probability of successfully curing someone is 0.9. This means that our new test statistic is X ~ B(100, 0.9).

We can use another probability distribution instead of the binomial.

Using the binomial distribution for this sort of problem would be time consuming, as we’d have to calculate lots of probabilities.

Fortunately, there’s another way. Rather than use the binomial distribution, we can use some other distribution instead.

Brain Power

What probability distribution could you use to approximate X ~ B(100, 0.9)?

Use the normal to approximate the binomial in our test statistic

We still need to find a test statistic we can use in our hypothesis test, and as the number in the sample is large, this means that using the binomial distribution will be time consuming and complicated.

There are 100 people in the sample, and the proportion of successes according to the drug company is 0.9. In other words, the number of successes follows a binomial distribution, where n = 100 and p = 0.9.

As n is large, and both np and nq are greater than 5, we can use X ~ N(np, npq) as our test statistic, where X is the number of patients successfully cured. In other words, we can use

X ~ N(90, 9)

Note

We can use this because n is large, np > 5 and nq is large.

to approximate any probabilities that we may need.

If we standardize this, we get

This means that for our test statistic we can use

You use the test statistic to work out probabilities you can use as evidence.

This means that we use Z as our test statistic, as we can easily use it to look up probabilities and see how unlikely the results of our sample are given the claims of the drug company. We substitute our value of 80 in place of X, so we can use it to find the probability of 80 or fewer being cured.

Step 3: Find the critical region

Now that we have a test statistic for our test, we need to come up with a critical region. As our alternate hypothesis is p < 0.9, this means that our critical region lies in the lower tail just as before.

The critical region also depends on the significance level of the test. Let’s choose the same significance level as before, so let’s test at the 5% level.

As our test statistic follows a standard normal distribution, we can use probability tables to find the critical value, c. The critical value is the boundary between whether we have strong enough evidence to reject the null hypothesis or not.

As our significance level is 5%, this means that our critical value c is the value where P(Z < c) = 0.05. If we look up the probability 0.05 in the probability tables, this gives us a value for c of –1.64. In other words,

P(Z < –1.64) = 0.05

This means that if our test statistic is less than –1.64, we have strong enough evidence to reject the null hypothesis.

SnoreCull failed the test

This time when we performed a hypothesis test on SnoreCull, there was sufficient evidence to reject the null hypothesis. In other words, we can reject the claims made by the drug company.

Hypothesis tests require evidence.

With a hypothesis test, you accept a claim and then put it on trial. You only reject it if there’s enough evidence against it. This means that the tests are impartial, as you only make a decision based on whether or not there’s sufficient evidence.

If we had just accepted the doctor’s opinion in the first place, we wouldn’t have properly considered the evidence. We would have made a decision without considering whether the results could have been explained away by mere coincidence. As it is, we have enough evidence to show that the results of the sample are extreme enough to justify rejecting the null hypothesis. The results are statistically significant, as they’re unlikely to have happened by chance.

So does this guarantee that the claims of the drug company are wrong?

Mistakes can happen

So far we’ve looked at how we can use the results of a sample as evidence in a hypothesis test. If the evidence is sufficiently strong, then we can use it to justify rejecting the null hypothesis.

We’ve found that there is strong evidence that the claims of the drug company are wrong, but is this guaranteed?

Even though the evidence is strong, we can’t absolutely guarantee that the drug company claims are wrong.

Even though it’s unlikely, we could still have made the wrong decision. We can examine evidence with a hypothesis, and we can specify how certain we want to be before rejecting the null hypothesis, but it doesn’t prove with absolute certainty that our decision is right.

The question is, how do we know?

Conducting a hypothesis test is a bit like putting a prisoner on trial in front of a jury. The jury assumes that the prisoner is innocent unless there is strong evidence against him, but even considering the evidence, it’s still possible for the jury to make wrong decisions. Have a go at the exercise on the next page, and you’ll see how.

The errors we can make when conducting a hypothesis test are the same sort of errors we could make when putting a prisoner on trial.

Hypothesis tests are basically tests where you take a claim and put it on trial by assessing the evidence against it. If there’s sufficient evidence against it, you reject it, but if there’s insufficient evidence against it, you accept it.

You may correctly accept or reject the null hypothesis, but even considering the evidence, it’s also possible to make an error. You may reject a valid null hypothesis, or you might accept it when it’s actually false.

Statisticians have special names for these types of errors. A Type I error is when you wrongly reject a true null hypothesis, and a Type II error is when you wrongly accept a false null hypothesis.

The power of a hypothesis test is the probability that that you will correctly reject a false null hypothesis.

Brain Power

How do you think we can find the probability of making a Type I error? How do you think we can find the probability of making a Type II error?

Let’s start with Type I errors

A Type I error is what you get when you reject the null hypothesis when the null hypothesis is actually correct. It’s like putting a prisoner on trial and finding him guilty when he’s actually innocent.

So what’s the probability of getting a Type I error?

If you get a Type I error, then this means that the null hypothesis must have been rejected. In order for the null hypothesis to have been rejected, the results of your sample must be in the critical region.

The probability of getting a Type I error is the probability of your results being in the critical region. As the critical region is defined by the significance level of the test, this means that if the significance level of your test is α, the probability of getting a Type I error must be also be α.

In other words,

P(Type I error) = α

where α is the significance level of the test.

What about Type II errors?

A Type II error is what you get when you accept the null hypothesis, and the null hypothesis is actually wrong. It’s like putting a prisoner on trial and finding him innocent when he’s actually guilty.

The probability of getting a Type II error is normally represented by the Greek letter β.

P(Type II error) = β

So how do we find β?

Finding the probability of a Type II error is more difficult than finding the probability of getting a Type I error. Here are the steps that are involved, and we’ll show you how to go through them on the next page.

Check that you have a specific value for H₁.
Without this, you can’t calculate the probability of getting a Type II error.
Find the range of values outside the critical region of your test.
If your test statistic has been standardized, the range of values must be de-standardized.
Find the probability of getting this range of values, assuming H₁ is true.
In other words, we find the probability of getting the range of values outside the critical region, but this time, using the test statistic described by H₁ rather than H₀.

Finding errors for SnoreCull

Let’s see if we can find the probability of getting Type I and Type II errors for the SnoreCull hypothesis test. As a reminder, our standardized test statistic is

where X is the number of people cured in the sample. The significance level of the test is 5%.

Let’s start with the Type I error

A Type I error is what you get when you reject the null hypothesis when actually it’s true. The probability of getting this sort of error is the same as the significance level of the test, so this means that

P(Type I error) = 0.05

Note

This gives you the probability of rejecting the null hypothesis that 90% of people are cured when it’s true.

So what about the Type II error?

A Type II error is what you get when you accept the null hypothesis when the alternate hypothesis is true. We can only calculate this if H₁ specifies a single specific value, so let’s use an alternate hypothesis of p = 0.8, as this is the proportion of successes in the doctor’s sample. This means that our hypotheses become

H₀: p = 0.9

H₁: p = 0.8

Note

This time we’ll use H¹: p = 0.8 instead of H¹: p < 0.8. We can only calculate the probability of getting a Type II error if we have a single specific value for the alternate hypothesis.

To look up probabilities using the alternate hypothesis probability distribution, we need an exact value for p.

The reason why H₁ must specify an exact value for p is so that we can calculate probabilities using it. If we used an alternate hypothesis of p < 0.9, we wouldn’t be able to use it to calculate the probability of getting a Type II error.

Relax

If you need to calculate the probability of getting a Type II error in an exam, you’ll be given H₁.

This means that you won’t have to decide on the alternate hypothesis yourself. If you need to calculate this sort of error, it will be given to you.

We need to find the range of values

Now that the alternate hypothesis H₁ gives a specific value for p, we can move on to the next step. We need to find the values of X that lie outside the critical region of the hypothesis test.

We saw back in Step 3: Find the critical region that the critical region for the test is given by Z < –1.64—in other words, P(Z < –1.64) = 0.05. This means that values that fall outside the critical region are given by Z ≥ –1.64.

If we de-standardize this, we get

In other words, we would have accepted the null hypothesis if 85.08 people or more had been cured by SnoreCull.

The final thing we need to do is work out P(X ≥ 85.08), assuming that H₁ is true. That way, we’ll be able to work out the probability of accepting the null hypothesis when actually H₁ is true instead. As we’re using the normal distribution to approximate X, we need to use a probability distribution X ~ N(np, npq), where n = 100 and p = 0.8. This gives us

X ~ N(80, 16)

This means that if we can calculate P(X ≥ 85.08) where X ~ N(80, 16), we’ll have found the probability of getting a Type II error.

We calculate this in the same way we calculate other normal distribution probabilities, by finding the standard score and then looking up the value in standard normal probability tables.

Find P(Type II error)

We can find the probability of getting a Type II error by calculating P(X ≥ 85.08) where X ~ N(80, 16). Let’s start off by finding the standard score of 85.08.

This means that in order to find P(X ≥ 85.08), we need to use standard probability tables to find P(Z ≥ 1.27).

P(Z ≥ 1.27)	= 1 – P(Z < 1.27)
	= 1 – 0.8980
	= 0.102

In other words,

P(Type II error) = 0.102

Note

This gives you the probability of accepting the null hypothesis that 90% of people are cured when actually 80% of people are.

Introducing power

So far we’ve looked at the probability of getting different types of error in our hypothesis test. One thing that we haven’t looked at is power.

The power of a hypothesis test is the probability that we will reject H₀ when H₀ is false. In other words, it’s the probability that we will make the correct decision to reject H₀.

Once you’ve found P(Type II error), calculating the power of a hypothesis test is easy.

Rejecting H₀ when H₀ is false is actually the opposite of making a Type II error. This means that

Power = 1 – β

where β is the probability of making a Type II error.

So what’s the power of SnoreCull?

We’ve found the probability of getting a Type II error is 0.102. This means that we can find the power of the SnoreCull hypothesis test by calculating

Power	= 1 – P(Type II error)
	= 1 – 0.102
	= 0.898

In other words, the power of the SnoreCull hypothesis test is 0.898. This means that the probability that we will make the correct decision to reject the null hypothesis is 0.898.

The doctor’s happy

In this chapter, you’ve run through two hypothesis tests, and you’ve proved that there’s sufficient evidence to reject the claims made by the drug company. You’ve been able to show that based on the doctor’s sample, there’s sufficient evidence that SnoreCull doesn’t cure 90% of snorers, as the drug company claims.

But it doesn’t stop there

Keep reading, and we’ll show you what other sorts of hypothesis tests you can use. We’ll see you over at Fat Dan’s Casino...

Q:	Q: Why are we assuming the null hypothesis is true and then looking for evidence that it’s false?
A:	A: When you conduct a hypothesis test, you, in effect, put the claims of the null hypothesis on trial. You give the null hypothesis the benefit of the doubt, but then you reject it if there is sufficient evidence against it. It’s a bit like putting a prisoner on trial in front of a jury. You only sentence the prisoner if there is strong enough evidence against him.
Q:	Q: Do the null hypothesis and alternate hypothesis have to be exhaustive? Should they cover all possible outcomes?
A:	A: No, they don’t. As an example, our null hypothesis is that p = 0.9, and our alternate hypothesis is that p < 0.9. Neither hypothesis allows for p being greater than 0.9.
Q:	Q: Isn’t the sample size too small to do this hypothesis test?
A:	A: Even though the sample size is small, we can still perform hypothesis tests. It all comes down to what test statistic you use — and we’ll come to that on the next page.
Q:	Q: So are hypothesis tests used to prove whether or not claims are true?
A:	A: Hypothesis tests don’t give absolute proof. They allow you to see how rare your observed results actually are, under the assumption that your null hypothesis is true. If your results are extremely unlikely to have happened, then that counts as evidence that the null hypothesis is false.

P(X ≤ 11)	= 1 – P(X ≥ 12)
	= 1 – (¹⁵C₁₂×0.1³×0.9¹² + ¹⁵C₁₃×0.1²×0.9¹³ + ¹⁵C₁₄×0.1×0.9¹⁴ + 0.9¹⁵)
	= 1 – (0.1285 + 0.2669 + 0.3432 + 0.2059)
	= 1 – 0.9445
	= 0.0555

Q:	Q: What significance level should I normally test at?
A:	A: It all depends how strong you want the evidence to be before you reject the null hypothesis. The stronger you want the evidence to be, the lower your significance level needs to be. The most common significance level is 5%, although you sometimes see tests at the 1% level. Testing at the 1% means that you require stronger evidence than if you test at the 5% level.
Q:	Q: Does the significance level have anything in common with the level of confidence for confidence intervals?
A:	A: Yes, they have0 a lot in common. When you construct a confidence interval for a population parameter, you want to have a certain degree of confidence that the population parameter lies between two limits. As an example, if you have a 95% level of confidence, this means that the probability that the population parameter lies between the two limits is 0.95. The level of significance reflects the probability that values will lie outside a certain limit. As an example, a significance level of 5% means that your critical region must have a probability of 0.05.

Q:	Q: How can we make the wrong decision if we’re conducting a hypothesis test? Don’t we do a hypothesis test to make sure we don’t?
A:	A: When you conduct a hypothesis test, you can only make a decision based on the evidence that you have. Your evidence is based on sample data, so if the sample is biased, you may make the wrong decision based on biased data.
Q:	Q: I’ve heard of something called significance tests. What are they?
A:	A: Some people call hypothesis tests significance tests. This is because you test at a certain level of significance.

Q:	Q: Why is it so much harder to find P(Type II error) than P(Type I error)?
A:	A: It’s because of the way they’re defined. A Type I error is what you get when you wrongly reject the null hypothesis. The probability of getting this sort of error is the same as α, the significance level of the test. A Type II error is the error you get when you accept the null hypothesis when actually the alternate hypothesis is true. To find the probability of getting this sort of error, you need to start by finding the range of values in your sample that would mean you accept the null hypothesis. Once you’ve found these values, you then have to calculate the probability of getting them assuming that H₁ is true.
Q:	Q: Do I need to use the normal distribution every time I want to find the probability of getting a Type II error?
A:	A: The probability distribution you use all depends on your test statistic. In this case, our test statistic followed a normal distribution, so that’s the distribution we used to find P(Type II error). If our test statistic had followed, say, a Poisson distribution, we would have used a Poisson distribution instead.

z	= (80 – 90)/3
	= –10/3
	= –3.33

Z =	(X̄ – 355)/0.5
	= (356.5 – 355)/0.5
	= 1.5/0.5
	= 3