Just generating and looking at frequency tables or crosstabs such as those above is
not necessarily the last step. You may also wish to ask various questions such as:
-
Are the distributions of observations between categories or cells even (there are
roughly equal numbers in each cell), or are there cells with significantly greater
or lower frequency than the numbers in another cell? For instance, in
Figure 15.2 Crosstab example of relating two categorical variables above you can see that the numbers in the six cells are not even. Clearly, the 56
observations in the Small/Freeware cell are more numerous than the 22 in Small/Premium.
However, in other cases these differences may not be as obvious, and you may wish
to seek further statistical evidence of differences.
-
Are the distributions of observations between categories or cells different from a
benchmark distribution of your choosing? For instance, in your industry a customer base of 50% big companies, 30% medium companies
and 20% small companies may be considered usual. In
Figure 15.1 Example of frequency analysis of categorical data in SAS we see that your actual distribution is 43%, 29% and 28% respectively. You may wish
to test whether the deviations of your customer base distribution is statistically
significantly different from the industry.
-
If there are significant differences, where specifically are they different? In the example in the previous bullet point, we may find that there are deviances
from the industry benchmark, but that these differences are specifically for small
and big customer distributions (i.e. that the 29% in the medium category is not statistically
significantly different from the industry average of 30%).
-
Is there evidence that one of the categorical variables may be dependent on others? This is a special case, where you believe the allocation of observations within a
certain categorical variable is partly affected by the observation’s membership in
another categorical variable. For instance, in the main book example you may believe
that the choice of a company to be a freeware versus premium customer may partly depend
on its size (perhaps because larger companies can probably more easily afford the
premium version).
These and many other questions can be answered using a variety of statistical tests.
The following sections discuss just an introductory sample of such tests.