Chapter 14
Learning Curve: Setting Up a Testing Plan
In This Chapter
Applying the scientific method to marketing campaigns
Designing a marketing experiment
Taking random samples
Finding significant results
Using control groups
In many ways, marketing is more art than science. Experience and intuition play a large role in the development of marketing and advertising strategies. It’s an age-old lament among marketing executives that they know only half of their marketing budget is effective. The problem is they don’t know which half.
At any given time, your company is running a variety of marketing and advertising campaigns. It’s complicated, and to some extent impossible, to figure out whether a customer purchased your product because they saw a TV commercial or because they saw an ad in the newspaper.
Marketers can measure a lot of things. You have a general sense of how many people see your TV commercials. Television viewership is well tracked. You know how many people subscribe to the newspapers you advertise in. But it’s difficult to connect this information to actual purchases.
As a database marketer, you’re in a unique position with respect to understanding the success of your campaigns. You know exactly whom you are communicating with. You also have at your disposal a number of ways of tracking exactly who responds to your campaigns.
In Chapter 15, I talk in detail about response tracking and measuring campaign results after the fact. But first I want to discuss the steps you need to take before you execute your campaign that will allow you to effectively measure them.
Using the Scientific Approach
We’ve been taught since grade school that scientific discovery proceeds through disciplined experimentation. You observe something that sparks your curiosity. You ask a specific question. Then, in the critical step, you formulate an answer to that question. That answer is called your hypothesis. You then proceed to test this hypothesis by doing an experiment.
You can apply a version of this scientific approach to your database marketing campaigns. In database marketing, the steps you follow look like this:
1. Identify a marketing goal: This may involve generating sales of a particular product. It may involve addressing customer-retention issues. Whatever it is, this goal represents a problem you are trying to address. The question you are trying to answer is: How can I meet this marketing goal?
2. Come up with a strategy: You dig around in your data for a way to achieve your goal. This may involve customer profiling or other more advanced analytic techniques. This is the heart of database marketing. Your strategy will include the definition of your target audience as well as the messaging approach you will take.
3. Formulate a hypothesis: Your hypothesis is essentially an educated guess as to what your campaign will achieve. This campaign will increase purchases by 5 percent among the target audience, for example.
4. Design a test: It’s critical to set yourself up to properly test your hypothesis. Chapter 6 introduces the notion of a control group. Your test design typically involves holding out a random portion of your target audience. It also involves understanding how responses will be tracked. I talk about both these things in more detail later in this chapter and in Chapter 15.
5. Execute: Run your campaign.
6. Analyze the data: The last step is to analyze your response data. This is where you evaluate your hypothesis. Were you really able to increase purchases by 5 percent?
This chapter is about steps 4 and 5. Forming your hypothesis and designing an effective test both require some careful thought. In what follows, I introduce you to some considerations that will help you to learn as much as possible from your database marketing campaigns.
Lesson Plans: Deciding Beforehand What You Want to Learn
Every database marketing campaign is an opportunity to learn something. You certainly want to be able to tell whether your campaign works. But you also want to be able to learn why it’s working, if it is.
You can’t test everything
As with many things in life, simple is usually better. Trying to test everything all at once is a recipe for learning nothing at all. As I discuss later in this section, you need to have a sufficient number of data points to get a meaningful read on what is and isn’t working.
In its traditional form, the scientific method admonishes you to test one thing at a time. In marketing experiments, the strategy is to break up your target audience into two groups. This is often referred to as an A/B split. You then send one communication to group A and another to group B. The difference in the communications is what you want to test.
The thinking behind the A/B split design is pretty straightforward. Suppose you send out one communication with a discounted offer to young families. You also send out a non-discounted offer to retirees. Now suppose the first offer dramatically outperforms the second. What have you learned? Have you learned not to market that product to retirees? Or does this suggest that you need to offer them a discount? The truth is, you haven’t really learned anything from this experiment.
There are two ways you can improve this experiment. One is to test only one thing at a time. Test a discounted offer to both audiences, for example. Or test both a discounted offer and a non-discounted offer to one audience.
The other way is to test all four possible combinations. You could test both offers against both audiences. This essentially means do two different experiments. One experiment would be an attempt to learn about the offer’s effectiveness with young families. The other would be the same experiment repeated for retirees.
Tracking responses
All your database marketing campaigns are designed to evoke some sort of response. You may be trying to drive purchases. You may simply be trying to drive web traffic or registrations. I make the point repeatedly throughout this book that a key component of your marketing message is a clear call to action.
That call to action is central to almost all of the marketing experiments that you design. This means that in analyzing the results of your experiments, you will typically be looking at the response rate. Loosely speaking, a response is an answer to your call to action. To learn anything from your campaign, you need to be able to recognize when a customer has responded.
Identifying responders
In many cases, figuring out which customers have responded to your campaign is not all that difficult. If your purchase process requires customer to identify themselves and provide their address, then it’s fairly easy to connect their purchase back to a direct-mail campaign. Airlines, hotels, banks, car dealers, and a host of other industries require a good deal of information from their customers at the time of purchase.
Tracking is also relatively straightforward for online transactions that are generated from e-mail campaigns. As long as the customer enters the correct e-mail address to receive a purchase confirmation, you can track that purchase back to the e-mail address used in your campaign. If your online purchase process requires the customer to register and log in, then you’re golden.
One problem with this approach is that these offer codes can grow legs. People sometimes share the codes with their friends. In some cases they find their way onto the Internet. At some level this is a good thing, because it does generate business. But at the same time, it complicates your experiment.
A simple refinement of the offer code approach solves at least part of this problem. You can actually generate individualized offer codes that can only be redeemed once. I sometimes get plastic discount cards in the mail that actually have a magnetic strip on the back that can be swiped at the checkout counter. This ties my transaction more directly to me. I can give the card away, but it still generates a unique purchase.
This technique doesn’t require the use of physical cards. It’s just as easy to create individualized offer codes that you can serve up to the customer in an e-mail. In fact, the plastic cards that I receive can actually be used this way as well. They have a code printed on the back, much like a credit-card number.
Defining your response window
Another aspect of your marketing experiment involves what you’re willing to treat as a response. More specifically, you need to decide when a response can be legitimately associated with a particular campaign. Your response window is the period of time over which you can reasonably assume customer behavior is really caused by your communication.
Many campaigns involve time-sensitive offers. In these cases, your decision is pretty obvious. The response window closes when the offer expires. But not all situations are this simple.
In other cases, you might need to put a little thought into how long you want the response window to stay open. Later in this chapter, when I talk about control groups, I describe a simple technique for getting some idea of when your campaign is no longer working.
But the basic idea is that, even if it doesn’t come with an explicit expiration date, your offer or message has a limited shelf life. You don’t want to be counting purchases that happen a year later, for example.
Taking a Random Sample
In Chapter 6, I talk about the importance of random sampling. This process is essential when it comes to setting up A/B splits. Both A and B groups need to have the same characteristics if you want your experiment to be meaningful.
This sorting means that if you want to split your target audience into two equal size groups, you can’t just cut it down the middle. If you do, then you’ve generated an A/B split that is made on the basis of how the file was sorted. The two groups will be inherently different.
When you split a sorted file, it makes comparing the two groups problematic. For example, many databases are naturally sorted according to when customer records were added. This means that when you pull a file, it may well contain older, more loyal customers at the top. Suppose you’re testing the effect of two different discounts. You decide to mail the smaller discount to the top half of the file, namely your best customers. This half of the file may well outperform the higher discount that is sent to your newer, less loyal customers.
This situation can be avoided by ensuring that your A/B splits are chosen randomly. In the next couple of sections, I explain two common ways that random samples are generated.
Selecting every nth record
One advantage to nth selection is that it’s simple to implement. In the early days of direct marketing, when computer resources were at more of a premium, this approach was appealing because it’s also fast. You just zip through the file once, and there’s no need to make any calculations.
Think about street numbers, for example. Suppose you’re trying to randomly split a file that’s been sorted by address. Addresses alternate, even numbers on one side of the street and odd numbers on the other. If you’re splitting a file in half, you run the risk of creating your A/B split based on which side of the road they live on.
I live in a neighborhood on a lake. If you split this neighborhood down the middle of the street, you’d find one group all owned lakefront property and the other group didn’t. You wouldn’t consider splitting a file based on home values for an experiment! But in this example that’s exactly what has happened.
Nth selection will do in a pinch. But given its drawbacks and the availability of computer power and advanced random-number generators, I recommend a different approach, which I outline below.
Flipping a coin
Okay, not literally flipping a coin. But the idea is the same. You use some kind of kind of random-number generator to simulate flipping a coin for each member of the target audience to determine which group that customer will belong to.
All statistical-analysis software as well as database-management software contains some sort of random-number generator. Even spreadsheets have them. Each time you invoke a random-number function, it returns a value between 0 and 1. By invoking this function, you can assign each customer record its own personal random number. These random values can be used in a variety of ways to generate a random split.
If the random number generator is invoked repeatedly, the values tend to spread out evenly over the interval from 0 to 1. Just as a fair coin will come up heads about half the time, the random number generator will produce values between 0 and 1/2 about half the time.
If you simply want to split your mail file in half, you can simulate a coin flip by generating a random value for each member of your target audience. If the value is less than 1/2, then you put the record in group A. If the value is greater than 1/2, then you put the record in group B. (Technically you need to account for the possibility that the value is equal to 1/2, but you could do database marketing for a thousand years and never see this happen.)
This approach can be easily modified to generate any sample size you might want. If you want a 5 percent sample, for example, you simply adjust the ranges to include in group A only random values between 0 and .05.
But these random numbers are only updated occasionally. As I discuss in Chapter 16, there will be times when you want to be able to take multiple samples from your database. If you do this based on a prepopulated random number, you get the same set of records every time you use a given range.
For example, suppose you want two distinct 10 percent samples of household records. Selecting households whose random number is in the range 0 to .10 will produce exactly the same results every time you make that selection based on pre-populated random numbers. To get a different sample, you need to choose a different range.
Getting Significant Results: Sample Size Matters to Confidence Level
The goal of all this design, preparation, sampling, and so forth is that you report the results of your marketing experiment with some degree of confidence. In fact, confidence level is actually a statistical term. It’s a measure of how likely it is that the results of an experiment happened purely by chance.
In the vast majority of cases, your marketing experiments will be designed to test a hypothesis involving response rates. Families with children responded at a higher rate than retired couples, for example. Or a 10 percent discount generates more responses than $50 off your next purchase.
More about flipping coins
These sorts of hypotheses have something fundamentally in common with a simple experiment. Suppose you wanted to demonstrate that a coin was biased. Flipping the coin once isn’t going to tell you. Nor will flipping it twice. Even if the coin is fair, you expect to see either heads or tails come up twice in a row about half the time.
But what if you flipped the coin 10 times and it came up heads only 4 times? Do you have reason to believe the coin is biased? This question can be answered in a very precise way. You can actually quantify the likelihood of this outcome. This in turn gives a way of measuring the confidence you would have in declaring the coin biased.
In the case of 10 coin flips, there is actually a 20 percent chance that you will observe exactly 4 heads even if the coin is fair. More importantly, there is an almost 40 percent chance that a fair coin will come up heads fewer than 5 times in 10 tosses. If your hypothesis is that the coin is fair, this experiment doesn’t give very solid evidence to the contrary.
Your confidence level in the result of an experiment is the probability that it didn’t happen by chance. This is known in statistics as an experiment’s p-value. In the preceding experiment, there is a 40 percent chance that the result happened by chance. Conversely, there is a 60 percent chance that it didn’t. That is, there is a 60 percent chance that the coin isn’t really fair. In this case you would say that you are 60 percent confident that you have a biased coin. Not terribly convincing.
If you had flipped the coin 50 times instead of 10 times and gotten the same 40 percent result, the situation changes dramatically. In this case, your confidence level soars to almost 90 percent that the coin is biased. If you flip it a hundred times and the same result occurs, your confidence level passes 97 percent. In general, as the number of flips increases, the more confident you become in the result.
Intuitively, this makes sense. The more times you observe something, the more likely it is that you are observing a persistent pattern.
Sample size and confidence levels
So what does all this coin flipping have to do with marketing, you ask? A lot, as it turns out. In your marketing experiments, you’re essentially trying to determine whether response rates are different between two groups of customers. You can use a similar statistical approach to assigning confidence levels to the results of these experiments.
Statistical techniques allow you to quantify the likelihood that two response rates really are different. There is always at least a small possibility that response rates differ purely by chance. You want to set up your experiment to make that possibility as small as possible.
The main tool at your disposal is that you can control, to some extent, the size of your A/B splits. As I explain in the preceding section, your confidence in your coin flip results increases as you flip the coin more times. Similarly, the larger the groups whose response rates you’re measuring, the more likely it is that you will see meaningful results.
Other factors that influence confidence levels
When trying to determine the appropriate sample size for an experiment, you need to take two factors into account:
The response rate you expect to get
How small a difference in response rates you want to be able to detect
Expected response rate
Because of the subtleties of the mathematics involved, the overall response rate to your campaign directly affects your confidence levels. The reason for this connection is beyond the scope of this book. But there is a fairly simple rule of thumb that you may find useful.
In other words, differences between response rates are harder to detect at significant levels when the response rate is close to 50 percent. One place you may see evidence of this fact is in polling data around election time. When a race comes down to two candidates, especially a close race, poll results hover somewhere in the neighborhood of 50 percent for each candidate. The margin of error reported in these polls is typically quite large, sometimes several percentage points wide.
In your database marketing campaigns, it’s the other end of this spectrum that typically comes into play. If you’re running database marketing campaigns that are generating 50 percent response rates, then you’re my hero. Typically response rates are much lower than that. And it affects sample sizes dramatically.
Suppose you want to be able to say with statistical confidence that a 25 percent response rate is better than a 23 percent response rate. Then you need to have a sample size just north of 3,500 to reach 95 percent confidence. However the difference between a 1 percent and 3 percent response rate requires a sample size of only a few hundred to reach the 95 percent level.
How big a difference do you want to detect?
In my experience, the precision issue is the biggest driver of large sample sizes. Many marketing campaigns are targeted at large audiences. This means that even a small improvement in response rates can be extremely valuable. What’s more, because these large programs cost a lot to execute, their sponsors want to be able to say with confidence that they’re worth it.
I’ve worked on mailings in the credit-card industry where overall response rates were expected to be less than 1 percent. In a case like that, even improving those rates by 0.2 percent would be well worth the effort. In order to be able to detect this level of difference, our A/B split sample sizes had to exceed 30,000 consumers.
Mission Control: Using Control Groups
As I mention throughout this book, one of the great advantages you have as a database marketer is your ability to measure the success of your campaigns. Without a doubt, the most frequent hypothesis you will be asked to test comes down to “Did this campaign work?” In Chapter 15, I talk about how to measure the success of database marketing campaigns and put that measurement into financial terms. All of that analysis depends on a particular kind of A/B split known as a control group.
Control groups and measurement
As Chapter 6 explains, a control group is basically a random sample of your target audience that you don’t communicate with. It’s like the use of placebos in pharmaceutical testing. The idea is you want to get a bead on what would happen if you didn’t do anything. The thing that happens if you do nothing is what you need to control for. Your ultimate goal here is to compare the performance of the mail group to that of the control group. It’s the difference between the two groups that represents your success.
You have other marketing communications and advertisements running virtually all the time. There is a chance that some responses to your direct-marketing campaigns would have come anyway based on these other initiatives.
By holding out a portion of your target audience, you can get an idea about how many responses fall into this category. If you design your experiment correctly, you will be in a position to clearly and (statistically) confidently show exactly how much business you were able to produce.
You may want to test any of a number of different factors, as I’ve mentioned before. Comparing the success of different offers, different message strategies, or different targeting strategies are all potentially informative experiments. But you’ll also be asked to include control groups so that you can evaluate the overall success of your campaigns.
In many cases, your control group can do double or triple duty. If you’re testing three different offers to the same target audience, there is absolutely no need to hold out a separate control group for each offer. Because the audiences are the same, one control group will give you a baseline against which you can compare all three offers.
Being careful: A couple of warnings
Don’t get caught up in percentages
The number of customers that you put in your control group depends on three things and three things alone. Notice that in all three cases, better news comes at the cost of a larger control group:
The response rate you expect to see: The higher the response rate, the larger your control group needs to be.
The precision with which you want to detect differences in response rates: The smaller the difference, the larger your control group needs to be.
The confidence level with which you want to be able to report results: This is typically set at either 90 percent or 95 percent. The higher the confidence level, the larger your control group needs to be.
Testing different target audiences
As I’ve said, your control group can potentially perform multiple duties. When you’re testing different offers, messages, or even channels, a single control group can serve as a basis of measuring all of your variations.
When testing different targeting criteria, you need to create a different control group for each audience split. These multiple control groups need to be sized on their own. In other words, they each individually need to be large enough to give you the confidence level you want. If you try to cobble together a control group containing members from more than one audience, you completely negate your ability to get a clean read on the success of your campaign.
Out of Control: Reasons to Skip the Control
There are times when you’ll be forced to abandon the idea of a measurement strategy for your database marketing campaigns. Despite your best intentions, there are situations in which it just doesn’t make sense to hold out a control group. Sometimes this is caused by a statistical issue. Other times it’s dictated by larger corporate priorities.
Small target audiences
Some marketing campaigns are just too small to allow for a useful control group design. When I say small, I mean that the target audience isn’t big enough to give you the kind of precision you would need to do a meaningful analysis of your campaign. You need to be aware of one situation in particular.
Lost opportunities
I use the example of pharmaceutical companies testing drugs several times in this book to explain some of the basic ideas around control groups. Here again, that industry provides an insightful example.
Some drug trials are wildly successful in demonstrating the effectiveness of a treatment. In some cases — especially ones involving particularly nasty or even fatal conditions — this success creates a moral conflict between the science of medicine and the practice of medicine. In these cases, it is generally considered wrong to continue the drug trial and withhold an effective treatment from the control group. The trial is then suspended, and patients who had been receiving the placebo are given the actual drug.
Most marketing executives buy into the idea that the information that comes from well-designed tests is worth the cost in lost revenue. But cases will arise when this is not true.
One situation where it really is a little pointless to keep holding out control groups is for well-established marketing campaigns. If you’ve been running the same campaign to the same audience year after year, you may already have a pretty good idea of how much that campaign is driving to the bottom line. There really isn’t a compelling reason to keep verifying that contribution.
Control groups often go out the window in times of trouble or heightened concern about company performance. If your CEO gets concerned about the company meeting its quarterly or annual sales goal, your marketing executives will feel the pressure. And they’ll do everything in their power to squeeze as many sales as they can from their marketing budgets. Control groups are an easy target (pardon the pun).