Chapter 2. Measuring Central Tendency: The Middle Way

image with no caption

Sometimes you just need to get to the heart of the matter.

It can be difficult to see patterns and trends in a big pile of figures, and finding the average is often the first step towards seeing the bigger picture. With averages at your disposal, you’ll be able to quickly find the most representative values in your data and draw important conclusions. In this chapter, we’ll look at several ways to calculate one of the most important statistics in town—mean, median, and mode—and you’ll start to see how to effectively summarize data as concisely and usefully as possible.

The Statsville Health Club prides itself on its ability to find the perfect class for everyone. Whether you want to learn how to swim, practice martial arts, or get your body into shape, they have just the right class for you.

image with no caption

The staff at the health club have noticed that their customers seem happiest when they’re in a class with people their own age, and happy customers always come back for more. It seems that the key to success for the health club is to work out what a typical age is for each of their classes, and one way of doing this is to calculate the average. The average gives a representative age for each class, which the health club can use to help their customers pick the right class.

Here are the current attendees of the Power Workout class:

image with no caption

It’s likely that you’ve been asked to work out averages before. One way to find the average of a bunch of numbers is to add all the numbers together, and then divide by how many numbers there are.

In statistics, this is called the mean.

image with no caption

Because there’s more than one sort of average.

You have to know what to call each average, so you can easily communicate which one you’re referring to. It’s a bit like going to your local grocery store and asking for a loaf of bread. The chances are you’ll be asked what sort of bread you’re after: white, whole-grain, etc. So if you’re writing up your sociology research findings, for example, you’ll be expected to specify exactly what kinds of average calculations you did.

Likewise, if someone tells you what the average of a set of data is, knowing what sort of average it is gives you a better understanding of what’s really going on with the data. It can give you vital clues about what information is being conveyed—or, in some cases, concealed.

We’ll be looking at other types of averages, besides the mean, later in this chapter.

If you want to really excel with statistics, you’ll need to become comfortable with some common stats notation. It may look a little strange at first, but you’ll soon get used to it.

image with no caption

Statisticians use letters to represent unknown numbers. But what if we don’t know how many numbers we might have to add together? Not a problem—we’ll just call the number of values n. If we didn’t know how many people were in the Power Workout class, we’d just say that there were n of them, and write the sum of all the ages as:

image with no caption

In this case, xn represents the age of the nth person in the class. If there were 18 people in the class, this would be x18, the age of the 18th person.

image with no caption

We can take another shortcut.

Writing x1 + x2 + x3 + x4 + ... + xn is a bit like saying “add age 1 to age 2, then add age 3, then add age 4, and keep on adding ages up to age n.” In day-to-day conversation it’s unlikely we’d phrase it like this. We’re far more likely to say “add together all of the ages.” It’s quicker, simpler, and to the point.

We can do something similar in math notation by using the summation symbol Σ, which is the Greek letter Sigma. We can use Σx (pronounced “sigma x”) as a quick way of saying “add together the values of all the x’s.”

image with no caption

Do you see how much quicker and simpler this is? It’s just a mathematical way of saying “add your values together” without having to explicitly say what each value is.

Now that we know some handy math shortcuts, let’s see how we can apply this to the mean.

We can use math notation to represent the mean.

To find the mean of a group of numbers, we add them all together, and then divide by how many there are. We’ve already seen how to write summations, and we’ve also seen how statisticians refer to the total count of a set of numbers as n.

If we put these together, we can write the mean as:

image with no caption

In other words, this is just a math shorthand way of saying “add together all of the numbers, and then divide by how many numbers there are.”

The mean is one of the most commonly used statistics around, and statisticians use it so frequently that they’ve given it a symbol all of its own: μ. This is the Greek letter mu (pronounced “mew”). Remember, it’s just a quick way of representing the mean.

image with no caption

When you calculate the mean of a set of numbers, you’ll often find that some of the numbers are repeated. If you look at the ages of the Power Workout class, you’ll see we actually have 3 people of age 20.

It’s really important to make sure that you include the frequency of each number when you’re working out the mean. To make sure we don’t overlook it, we can include it in our formula.

If we use the letter f to represent frequency, we can rewrite the mean as

image with no caption

This is just another way of writing the mean, but this time explicitly referring to the frequency. Using this for the Power Workout class gives us

image with no caption

It’s the same calculation written slightly differently.

Here’s another hopeful customer looking for the perfect class. Can you help him find one?

image with no caption

This sounds easy enough to sort out. According to the brochure, the Health Club has places available in three of its Tuesday evening classes. The first class has a mean age of 17, the second has a mean of 25, and the mean age of the third one is 38. Clive needs to find the class with an average student age that’s closest to his own.

Clive went along to the class with the mean age of 38. He was expecting a gentle class where he could get some nonstrenuous exercise and meet other people his own age. Unfortunately...

image with no caption

What could have gone wrong?

The last thing Clive expected (or wanted) was a class that was primarily made up of teenagers. Why do you think this happened?

We need to examine the data to find out. Let’s see if sketching the data helps us see what the problem is.

Sketch the histograms for the Kung Fu and Power Workout classes. (If you need a refresher on histograms, flip back to Chapter 1.) How do the shapes of the distributions compare? Why was Clive sent to the wrong class?

Power Workout Classmate Ages

Age

19

20

21

Frequency

1

3

1

Kung Fu Classmate Ages

Age

19

20

21

145

147

Frequency

3

6

3

1

1

image with no caption

Did you see the difference in the shape of the charts for the Power Workout and Kung Fu classes? The ages of the Power Workout class form a smooth, symmetrical shape. It’s easy to see what a typical age is for people in the class.

The shape of the chart for the Kung Fu class isn’t as straightforward. Most of the ages are around 20, but there are two masters whose ages are much greater than this. Extreme values such as these are called outliers.

image with no caption

If you look at the data and chart of the Kung Fu class, it’s easy to see that most of the people in the class are around 20 years old. In fact, this would be the mean if the ancient masters weren’t in the class.

We can’t just ignore the ancient masters, though; they’re still part of the class. Unfortunately, the presence of people who are way above the “typical” age of the class distorts the mean, pulling it upwards.

image with no caption

Can you see how the outliers have pulled the mean higher? This effect is caused by outliers in the data. When this happens, we say the data is skewed.

The Kung Fu class data is skewed to the right because if you line the data up in ascending order, the outliers are on the right.

Let’s take a closer look at this.

image with no caption

Clive: They told me the average age for the class is about 38, so I thought I’d fit in alright. I had to sit down after 5 minutes before my legs gave out.

Bendy Girl: But I didn’t see anyone that age in the class, so there must have been some sort of mistake in their calculations. Why would they tell you that?

Clive: I don’t think their calculations were wrong; they just didn’t tell me what I really needed to know. I asked them what a typical sort of age is for the class, and they gave me the mean, 38.

Bendy Girl: And that’s not really typical, is it? I mean, just looking at the people in the class, I would’ve thought that a younger age would be a bit more representative.

Clive: If only they’d left the Ancient Masters out of their calculations, I would’ve known not to go to the class. That’s what did it; I’m sure of it. They distorted their whole calculation.

Bendy Girl: Well, if the Ancient Masters are such a big problem, why can’t they just ignore them? Maybe that way they could come up with a more typical age for the class...

If the mean becomes misleading because of skewed data and outliers, then we need some other way of saying what a typical value is. We can do this by, quite literally, taking the middle value. This is a different sort of average, and it’s called the median.

To find the median of the Kung Fu class, line up all the ages in ascending order, and then pick the middle value, like this:

image with no caption

If you line all the ages up in ascending order, the value 20 is exactly halfway along. Therefore, the median of the Kung Fu class is 20.

What if there had been an even number of people in the class?

image with no caption

If you have an even set of numbers, just take the mean of the two middle numbers (add them together, and divide by 2), and that’s your median. In this case, the median is 20.5.

Your work on averages is really paying off. More and more people are turning up for classes at the Health Club, and the staff is finding it much easier to find classes to suit the customers.

This teenager is after a swimming class where he can make new friends his own age.

image with no caption

The swimming class has a mean age of 17, and coincidentally, that’s the median too. It sounds like this class will be perfect for him.

image with no caption

Let’s see what happens...

The Little Ducklings class meets at the swimming pool twice a week. In this class, parents teach their very young children how to swim, and they all have lots of fun splashing about in the water.

Look who turned up for lessons...

image with no caption
image with no caption

Here are the ages of people who go to the Little Ducklings class, but some of the frequencies have fallen off. Your task is to put them in the right slot in the frequency table. Nine children and their parents go to the class, and the mean and median are both 17.

Age

1

2

3

31

32

33

Frequency

3

 

2

2

  
image with no caption

Here are the ages of people who go to the Little Ducklings class, but some of the frequencies have fallen off. Your task is to put them in the right slot in the frequency table. Nine children and their parents go to the class, and the mean and median are both 17.

image with no caption

Let’s take a closer look at what’s going on.

Here are the ages of people who go to the Little Ducklings class.

image with no caption

The mean and median for the class are both 17, even though there are no 17-year-olds in the class!

But what if there had been an odd number of people in the class. Both the mean and median would still have been misleading. Take a look:

image with no caption

If another two-year-old were to join the class, like we see above, the median would still be 3. This reflects the age of the children, but doesn’t take the adults into account.

image with no caption

If another 33-year-old were added to the class instead, the median would be 31. But that fails to reflect all the kids in the class.

Whichever value we choose for the average age, it seems misleading.

In addition to the mean and median, there’s a third type of average called the mode. The mode of a set of data is the most popular value, the value with the highest frequency. Unlike the mean and median, the mode absolutely has to be a value in the data set, and it’s the most frequent value.

image with no caption

Sometimes data can have more than one mode. If there is more than one value with the highest frequency, then each one of these values is a mode. If the data looks as though it’s representing more than one trend or set of data, then we can give a mode for each set. If a set of data has two modes, then we call the data bimodal.

This is exactly the situation we have with the Little Ducklings class. There are really two sets of ages we’re looking at, one for parents and one for children, so there isn’t a single age that’s totally representative of the entire class. Instead, we can say what the mode is for each set of ages. In the Little Ducklings class, ages 2 and 32 have the highest frequency, so these ages are both modes. On a chart, the modes are the ones with the highest frequencies.

Your efforts at the Health Club are proving to be a huge success, and demand for classes is high.

image with no caption