ES2ePubChapNumbers-11

What the simulation does

Open file 2-Statistics.xlsx on the sheet called “Sampling.” The center part is the same as in Simulation 10, but the frequency tables on both sides differ from each other.

The left one calculates frequencies for one specific sample—in this case, the sample of 10 cases in row 6. The right one calculates frequencies for the means of all 20 samples.

Notice how the graph to the right displays a curve that usually comes close to a normal distribution , whereas the one to the left does not at all. The left one shows a sample distribution, but the right one a sampling distribution.

What you need to know

In science, we usually deal with a single sample of a very limited size, whereas statistics wants to deal with situations where the samples are much larger or repeated numerous times.

In order to say something statistically significant about a sample distribution (to the left), we need a way to assess what the sampling distribution (to the right) would be like.

Two important measures for a sample distribution are the average (or mean) and the standard deviation (or SD). All we have on hand are the mean and standard deviation as found in the sample (to the left). So we have to obtain an estimate of the SD of the means based on the SD of the observations.

We do this by calculating the standard error (SE), also called the relative SD. The formula for the SE is as follows: =SD /√n. As you can gather from this formula, SE decreases when the sample size increases—and, of course, when SD decreases.

What you need to do

The center part is the same as in Simulation 10.

Range B2:B12: =FREQUENCY(D6:M6,A2:A12).

Range Q2:Q12: =FREQUENCY(N2:N21,P2:P12).

Cell B14: =STDEV (D6:M6). There is also a function STDEVP, which is appropriate when dealing with the entire population, not one of its samples.

Cell Q14: =B14/SQRT (10). This calculates the sampling SE based on the sample SD divided by the square root of 10 cases.

Hit Sh F9 to see new simulations. The left graph may vary wildly, but the right one remains more or less stable, staying close to a normal distribution (see below).