Usually when
you do a statistical analysis you draw a sample to make inferences
about the broader population. Your
interest is usually in making statistical estimates of interest on
the sample data, so that you can generalize these to the broader population.
For instance, you may make simple estimates on single variables (e.g.
means), or relational estimates between variables or groups (e.g.
correlation, regression, ANOVA, time series).
When doing sample-based
statistics, the best-case scenario is for you to draw not
one sample but multiple samples of similar size and characteristics.
This is usually unrealistic for cost, time and other reasons, but
theoretically possible and the benchmark for accuracy. You might even
draw hundreds, or thousands, of samples. If you could achieve this
theoretical best-case scenario then you would estimate
your statistic of interest (e.g. a correlation) not on just one sample
but on each of the multiple samples. For example,
perhaps you could draw hundreds of different and large samples of
business customers where each sample is similar in size and characteristic
(i.e. essentially the same number and type of customer firms in each
sample). If you were interested in average monthly spend, you would
measure this average on each of the samples.
Having done this, the
statistic of interest would differ between each sample.
Perhaps in the first sample you would discover that average spending
is $1,500, but in the second sample it is $1,710 and in the third
sample it is $1,382. You can already see that the average is varying
between samples, and that there would be some inaccuracy if you settled
on $1,500 as the representative average. You would collect and arrange
each of these different estimates of the sample statistic, with enough
samples you would be getting a representative range of possibilities.
For example, if you had ten samples you might get the following range
of average spending:
Having multiple different
estimates of the statistic of interest, you can now compare
the different estimates of the statistic and see how close together
they lie. If there is a narrow range of estimates
of the statistic, then you would have confidence in the accuracy of
the statistic. If the range is very broad, then you would have less
confidence in the overall accuracy. We would not usually take the
whole range because outliers can happen; we might instead take the
middle 95% or 99% of the values generated in the multiple samples.
This, incidentally, would give us 95% and 99% confidence intervals.
This would give us the best way of assessing the realistic range of
values and therefore the relative accuracy.
Obviously very few studies
have the ability to draw multiple different samples like this; most
of the time you only have one sample. Even if you can draw multiple
samples you usually will not have many and the few you have may not
be comparable. (In the best-case scenario, it is important that the
multiple samples all represent the underlying population and therefore
be comparable; otherwise you cannot compare the statistical estimates.)
You have to compromise
away from the best-case scenario. How do we compromise? Two major
approaches are the traditional parametric approach, which I explain
next, and the bootstrapping approach, which I explain afterwards.