Chapter 12
The marketing metrics X-ray and testing
The marketing metrics X-ray
Our purpose in this chapter is to give some examples of how marketing metrics can augment and complement traditional financial metrics when used to assess firm and brand performance. In particular, marketing metrics can serve as leading indicators of problems, opportunities, and future financial performance. Just as X-rays (now MRIs) are designed to provide deeper views of our bodies, marketing metrics can show problems (and opportunities) that would otherwise be missed.
Put your money where your metrics are
Table 12.1
shows common summary financial information for two hypothetical companies, Boom and Cruise. Income statement data from five years provide the basis for comparing the companies on several dimensions.
On which firm would you bet your grandparents’ savings?
We have used this example with MBA students and executives many times—usually, we ask them “Assume that your grandparent wants
to buy a partnership in one of these firms, using limited retirement savings. If these financial statements were the only
data you had available or could obtain, which firm would you recommend?” These data are the metrics traditionally used to evaluate firm performance.
The table shows that gross margins and profits are the same for both firms. Although Boom’s sales and marketing spending are growing faster, its return on sales (ROS) and return on investment (ROI) are declining. If this decline continues, Boom will be in trouble. In addition, Boom’s marketing/sales ratio is increasing faster than Cruise’s. Is this a sign of inefficient marketing?
Table 12.1 Financial statements
On the basis of the information in
Table 12.1
, most people chose Cruise. Cruise is doing more with less. It’s more efficient. Its trend in ROS looks much better, and Cruise has maintained a fairly consistent ROI of about 5%. About the only thing Boom has going for it is size and growth of the “top line” (sales revenue). Let’s look deeper at the marketing metrics X-ray.
Using the marketing metrics X-ray
Table 12.2
presents the results of our marketing metrics X-ray of Boom and Cruise. It shows the number of customers each firm is serving and separates these into “old” (existing customers) and “new” customers.
This table allows us to see not only the rate at which the firm acquired new customers but also their retention (loyalty) rates. Now, Boom’s spending on marketing looks a lot better because we
now know that spending was used to generate new customers and keep old ones. In addition, Boom acquires new customers at a lower cost than Cruise. And although Cruise’s customers spend more, Boom’s stay around longer. Perhaps we should order another set of X-rays to examine customer profitability and lifetime value?
Table 12.3
uses the information in the previous table to calculate some additional customer metrics. Under an assumption of constant margins and retention rates and a 15% discount rate, we can calculate the customer lifetime value (CLV) for the customers of each firm and compare this CLV with what the firms are spending to acquire the customers. The CLV represents the discounted margins a firm will earn from its customers over their life buying from the firm. Refer to
Chapter 5
for details about the estimation of CLV and the process for using the number to value the customer base as an asset. The asset value is merely the number of ending customers times their remaining lifetime value (CLV minus the just-received margin). For these examples, we have assumed that all marketing is used to acquire new customers, so the customer acquisition cost is obtained by dividing marketing spending by the new customers in year period.
Boom’s aggressive marketing spending looks even better in this light. The difference between the CLV and acquisition cost is only $3.71 for Cruise but is $48.21 for Boom. From the viewpoint of the customer asset value at the end of year five, Boom is worth almost five times as much as Cruise.
Table 12.4 gives us even more information on customers. Customer satisfaction is much higher for Boom, and Boom’s customers are more willing to recommend the firm to others. As a consequence, we might expect Boom’s acquisition costs to decline in the future. In fact, with such a stable and satisfied customer base, we could expect that brand equity (refer to
Chapter 4
) measures would be higher too.
Table 12.2 Marketing metrics
Table 12.3 Customer profitability
Customer value metric
|
Boom
|
Cruise
|
Customer CLV
|
$123.21
|
$96.71
|
Customer Acquisition Cost
|
$75.00
|
$93.00
|
Customer Count ( Thousands)
|
15.67
|
4.88
|
Customer Asset Value ( Thousands)
|
$1,344
|
$222
|
Table 12.4 Customer attitudes and awareness
Hiding problems in the marketing baggage?
The income statement for another example firm, Prestige Luggage, is depicted in
Table 12.5
. The company seems to be doing quite well. Unit and dollar sales are growing rapidly. Margins before marketing are stable and quite robust. Marketing spending and marketing to sales ratios are growing, but so is the bottom line. So what is not to like?
Table 12.5 Prestige Luggage income
Using the marketing metrics X-ray
Let’s take a deeper look at what’s going on with Prestige Luggage by examining their retail customers. When we do, we’ll get a better view of the marketing mechanics that underlie the seemingly pleasant financials in
Table 12.5
.
Table 12.6
(refer to
Chapter 6
for distribution measures) shows that Prestige Luggage’s sales growth comes from two sources: an expanding number of outlets stocking the brand and an increase (more than four-fold) in price promotions. Still, there are plenty of outlets that do not stock the brand. So there may be room to grow.
Table 12.7
reveals that although the overall sales are increasing, they are not keeping pace with the number of stores stocking the brand. (Sales per retail store are already declining.) Also, the promotional pricing by the manufacturer seems to be encouraging individual stores’ inventories to grow. Soon, retailers may become irritated that the GMROII (gross margin return on inventory investment) has declined considerably.
Future sales may continue to slow further and put pressure on retail margins.
If retailer
dissatisfaction causes some retailers to drop the brand from their assortment, manufacturer sales will decline precipitously.
Table 12.6 Prestige Luggage marketing and channel metrics
Table 12.7 Luggage manufacturer retail profitability metrics
In addition, the broadening of distribution and the increase of sales
on deal suggest a possible change in how potential consumers view the previously exclusive image of the Prestige Luggage brand. The firm might want to order another set of X-rays to see if and how consumer attitudes about the brand have changed. Again, if these changes are by design, then maybe Prestige Luggage is okay. If not, then Prestige Luggage should be worried that its established strategy is falling apart. Add that to the possibility that some retailers are using deep discounts to unload inventory after they’ve dropped the brand, and suddenly Prestige Luggage faces a vicious cycle from which it may never recover.
Some things you can’t make up, and this example is one. The actual company was “pumped up” through a series of price promotions, distribution was expanded, and sales grew rapidly. Shortly after being bought by another company looking to add to its luxury goods portfolio of brands, the strategy unravelled. Many stores dropped the line, and it took years to rebuild the brand and sales.
These two examples illustrate the importance of digging behind the financial statements using tools such as the marketing X-ray. More numbers, in and of themselves, are only part of the answer. The ability to see patterns and meaning behind the numbers is even more important.
Smoking more but enjoying it less?
Table 12.8
displays marketing metrics reported by a major consumer-products company aimed at analysing the trends in competition by lower-priced discount brands. A declining market size, stagnant company market share, and a growing share of firm sales accounted for by discount brands all made up a baleful picture of the future. The firm was replacing premium sales with discount brand sales. To top it off, the advertising and promotion budgets had almost doubled. In the words of Erv Shames, Darden Professor, it would be easy to conclude that the marketers had “run out of ideas” and were resorting to the bluntest of instruments: price.
Table 12.8 Market trends for discount brands and
spending: Big Tobacco Company
Year
|
1987
|
1992
|
Market Size (Units)
|
4,000
|
3,850
|
Company Unit Share
|
25%
|
24%
|
Unit Sales
|
1000
|
924
|
Premium Brand Units
|
925
|
774
|
Discount Brand Units
|
75
|
150
|
Advertising & Promotion Spend
|
$600
|
$1,225
|
The picture looks much brighter, however, after examining the metrics in
Table 12.9
. It turns out that in the same five years during which discount brands had become more prominent, sales revenue and operating income had both grown by over 50%. The reason is clear: Prices had almost doubled, even though a large portion of these price increases had been “discounted back” through promotions. Overall, the net impact was positive on the firm’s bottom line.
Table 12.9 Additional metrics
Year
|
1987
|
1992
|
Revenue (Thousands)
|
$1,455
|
$2,237
|
Average Unit Price
|
$1.46
|
$2.42
|
Average Premium Price
|
$1.50
|
$2.60
|
Average Discount Price
|
$0.90
|
$1.50
|
Operating Profit (Thousands)
|
$355
|
$550
|
Now you might be thinking that the messages in
Table 12.9
are so obvious that no one would ever find the metrics in
Table 12.8
to be as troubling as we made them out to be. In fact, our experience in teaching a case that contains all these metrics is that experienced marketers from all over the world tend to focus on the metrics in
Table 12.8
and pay little or no attention to the additional metrics—even when given the same level of prominence.
The situation described by the two tables is a close approximation to the actual market conditions just before the now-famous “Marlboro Friday.” Top management took action because they were concerned that the series of price increases that led to the attractive financials in 1992 would not be sustainable because the higher premium prices gave competitive discount brands more latitude to cut prices. On what later became known as “Marlboro Friday,” 2 April 1993, Phillip Morris cut Marlboro prices by $0.40 a pack, reducing operating earnings by almost 40%. The stock price tumbled by 25%.
Note in this example the contrast from the preceding example. Prestige Luggage was increasing promotion expenditures to expand distribution. Prices were falling while promotion, or sales on deal, were increasing—an ominous sign. With Marlboro, they were constantly raising the price and then discounting back—a very different strategy.
Marketing dashboards
The presentation of metrics in the form of management “dashboards” has received a substantial amount of attention in the last several years. The basic notion seems to be that the manner of presenting complex data can influence management’s ability to recognise key patterns and trends. Would a dashboard, a graphical depiction of the same information, make it easier for managers to pick up the ominous trends?
The metaphor of an automobile dashboard is appropriate because there are numerous metrics that could be used to measure a car’s operation. The dashboard is to provide a reduced set
of the vital measures
in a form that is easy for the operator to interpret and use.
Unfortunately, although all automobiles have the same key metrics, it is not as universal across all businesses. The set of appropriate and critical measures may differ across businesses.
Figure 12.1
presents a dashboard of five critical measures over time.
It reveals strong sales growth while maintaining margins even though selling less expensive items. Disturbingly, however, the returns for the retailer (GMROII) have fallen precipitously while store inventories have grown. Sales per store have similarly dropped. The price premium that Prestige Luggage can command has fallen, and more of the company’s sales are on deal. This should be a foreboding picture for the company and should raise concerns about the ability to maintain distribution.
Figure 12.1
Prestige Luggage: marketing management dashboard
Summary: marketing metrics + financial metrics = deeper insight
Dashboards, scorecards and what we have termed “X-rays” are collections of marketing and financial metrics that management believes are important indicators of business health. Dashboards are designed to provide depth of marketing understanding concerning the business. There are many specific metrics that may be considered important, or even critical, in any given marketing context. We do not believe it is generally possible to provide unambiguous advice on which metrics are most important or which management decisions are contingent on the values and trends in certain metrics. These recommendations would have to be of the “if, then” form, such as “If relative share is greater than 1.0 and market growth is higher than change in GDP, then invest more in advertising.” Although such advice might be valuable under many circumstances, our aims were more modest—simply to provide a resource for marketers to achieve a deeper understanding of the diversity of metrics that exist.
Our examples, Boom versus Cruise, Prestige Luggage, and Big Tobacco, showed how selected marketing metrics could give deeper insights into the financial future of companies. In situations such as these, it is important that a full array of marketing and financial metrics inform the decision. Examining a complete set of X-rays does not necessarily make the decisions any easier (the Big Tobacco example is debated by knowledgeable industry observers to this day!), but it does help ensure a more comprehensive diagnosis.
The value of information
How much should you spend on gaining information, e.g., market research? Imagine a firm has three potential marketing strategies: (1) Bold, (2) Moderate, and (3) Cautious. There are three possible moods that, collectively, the target consumers are in: excited (40% chance), happy (40% chance) or cynical (20% chance). The firm has
to decide how much to spend to learn the mood of the target consumers.
The cautious strategy will earn $2m in profit whatever mood the consumers are in. The bold strategy will resonate with excited consumers (earning $10m), perform decently with happy consumers (earning $2m), but alienate cynical consumers (losing $8m). The moderate strategy does pretty well with excited ($5m) and happy consumers ($3m), plus it only loses $1m when paired with cynical consumers.
Given this, what is the value of perfect information about the mood of the target consumers?
First, calculate how much we can expect to gain with no further information. This is the maximum of the expected values of the three strategies; where expected value is the probability weighted average of the outcome values.
Thus, without any additional information on the consumers’ mood we’d choose the Bold strategy as its expected value ($3.2 million) is the highest.
If market research could give us perfect information this would allow us to pick the best strategy to pair with the consumers’ mood, i.e. Bold with Excited (gaining $10m), Moderate with Happy (gaining $3m) and Cautious with Cynical (gaining $2m). With perfect information we would expect a profit of:
Because the expected value with perfect information is $5.6 million and the expected value without additional information is $3.2 million, the expected value of the perfect information is the
difference between the two or $2.4 million.
This quantity is an upper bound on the value of any actual information the firm can collect. In the real world the value of any imperfect information the firm can collect must be less than this upper bound.
These calculations assume marketers only care about expected value when, in fact, risk is also a concern. Firms prefer a certain $10 million to a 50% chance of gaining $20 million and 50% chance of gaining nothing even though the expected values are the same. This is known as risk aversion. If you are risk averse you may wish to pay for information that reduces the range of outcomes that you face even if this doesn’t change the expected value—a consideration not taken into account in the calculation of the expected value of perfect information.
Individual decision makers are also often loss averse. When loss averse, decision makers are willing to reduce the expected value of a decision in order to limit potential losses. Unlike risk aversion, loss aversion is often viewed by economists as poor decision making. A marketer who knows they will be sacked if they lose money in the above scenario might select the cautious strategy which, although it has the lowest expected value, never loses money. The actual value of perfect information to risk averse and loss averse decision makers is usually higher than expected value of perfect information because the information not only improves expected value but decreases risk/loss.
In summary, the value of information and so the usefulness of market research and testing varies with the precise situation at hand. Since reality is decidedly more complicated than our illustrative example, allocating data collection and analytical resources is an important management decision. Estimating the value of information requires specific quantitative inputs, which are often assumptions. Unfortunately managers may be sufficiently unsure of these inputs that they are unable or unwilling to quantify these estimates. Even in these instances, however, it may be
worthwhile to develop an intuitive appreciation of when additional information is likely to be most valuable.
Table 12.10
should be useful in making qualitative comparisons of situations in which managers are uncertain about the value of collecting further data to refine their choices.
Table 12.10 Quick guide to the value of information
Criteria
|
Information most valuable when
|
Information least valuable when
|
Potential financial consequences of decision
|
Large difference between the consequences of the best and worst alternatives
|
Small differences between the consequences of the best and worst alternatives
|
Uncertainty of future outcomes
|
High degree of uncertainty
|
Low degree of uncertainty
|
Ability of information to change decision
|
Information is likely to change the decision. (A combination of powerful information and close initial decision)
|
Information is not likely to change the decision. (A combination of poor information quality and obvious initial decision)
|
Validity of metrics
|
Metrics are valid indicators of market outcome
|
Metrics are biased indicators of market outcomes
|
Reliability of data
|
When the sample size is large and measurement error is small
|
The sample size is small and measurement error is large
|
Testing
Testing usually underpins successful marketing. When you are not sure which advertising copy to use, which marketing mix elements to emphasise, or even which product variants to offer, testing can help. When testing, you should consider the precision (known as reliability) of the test. Testing an advertisement on one person will
be unreliable as each person has idiosyncrasies. Increasing the number of respondents in the test increases your confidence that the responses are typical of the group being tested. You must also consider the validity of your test; e.g., are you testing the right group? Are you asking the right question? If you test an advertisement on bank managers, this may not be valid for estimating how your target market, college athletes, will react.
We have already mentioned side-by-side A/B testing, where two versions of an advertisement are created and tested against each other. Online it is easy to serve randomly selected visitors different versions of the advertisement. Randomly selecting which visitors get which advertisement ensures that there is no systematic difference between who gets which advertisement suggesting that any observed difference is driven by the advertisement. Online it is usually relatively cheap to create and test new versions of advertisements. That said even online testing is not free and in general the cost of any testing can be high.
The more versions of an advertisement we create and test, the greater the chance of finding excellent copy. Unfortunately, creating and testing versions reduces the money available to spend on deploying whichever version wins the test. The Gross model is designed to help managers make this tradeoff.
The Gross model
The Gross model, named after Irwin Gross, advises how much of the budget should be spent on creating (and testing) advertising copy. The number of alternative copies you should be willing to develop depends upon the variability of the effectiveness of the advertisements. If some advertisements are highly effective, but most ineffective, you will want to spend relatively heavily on developing and testing copy. The potential upside is high and you want to develop many versions in order to get a great version. If, however, all advertisements perform relatively similarly, the difference between what you already have, and what you can gain with further development, is quite limited. In that case it is better to
spend less on developing new copies and spend more of the budget showing the currently best advertisement.
In the Gross model, the effectiveness of an advertising campaign (Z) is the amount spent on buying media (D), multiplied by the effectiveness of the best advertisement created, (E). So So Z = E * D
If one assumes that advertising effectiveness is linear, i.e., each piece of spending is equally effective, this is a simple model. Unfortunately advertising often takes multiple views to gain any traction and eventually loses effectiveness: the S curve described in
Chapter 9
. This means E varies with the amount spent on media (D) making the model more difficult to use.
The amount spent on media (D) is the total budget (B) less the fixed costs of copy testing (CF
) and the total costs of each new advertisement. This is the number of advertisements (N) developed multiplied by the average cost to develop an advertisement (C) plus the marginal (extra) costs to test each advertisement (CS
).
O’Connor and her colleagues in 1996 applied this formula using the historical distribution of advertising effectiveness and concluded that 20–30% of media budgets (B) should be spent on developing and testing, with the rest (D) spent on deployment. Of course this rule of thumb may vary significantly depending on context.
Should you test another advertisement?
The original Gross model was designed to help managers decide how many advertisements to create and test. The general conclusion was that firms tended to spend too little on creating and testing alternative versions. (Perhaps because this budget item is often called “non-working” media expense).
We will ask a slightly different question. Should the firm develop an additional advertising copy execution or use all of the remaining budget to air the current best copy?
Let B be the total budget to create, test and deploy the best testing advertising. Let C be the cost to develop and test a new version of the advertisement, i.e., the expenditure incurred when you commission a new piece of copy. Because tests are never perfectly reliable or valid, we propose a “vaguely right” adjustment factor (A). The formula for the adjustment factor is

(an approach similar to that proposed by Irwin Gross). As the reliability and validity of the tests approach 1 the adjustment factor approaches 1. As A gets higher, i.e., whenever the validity and reliability of the test gets lower, each test is less useful at predicting real-world performance. The intuition is that with low reliability and validity, the test would need to indicate a higher probability of increased effectiveness before we would spend the money for the test. In our model we use A = 2, which means the test would need to indicate an expected return 2X the cost before we would deem it acceptable. This 2X would, for example, result from reliability of 70% and validity of 70%.
We can now estimate how much lift we need to expect to gain from a new version of an advertisement to make commissioning it worthwhile. This breakeven lift is the cost of commissioning the new version as a percentage of your free budget multiplied by the adjustment factor. If we have a budget of $2.25 million and the cost to develop and test each version is $40,000, then developing a new version costs 1.8% of the budget. If the adjustment factor is 2 we must expect to gain at least 2 * 1.8% = 3.6% performance lift to make commissioning a new version of the advertisement worthwhile; see
Table 12.11
. Of course, this approach means that each time we spend money for a copy test the remaining budget is also reduced and the next copy decision will represent a higher percentage of the budget and, therefore, require a higher percent expected increase in sales to justify undertaking the test.
Table 12.11 (Breakeven) lift needed to commission new piece of copy
Free Budget to Buy Media or Develop and Test Versions (B)
|
$2,250,000
|
Cost to Develop and Test New Version (C)
|
$40,000
|
Developing and Testing as % of Budget (D = C/B)
|
1.8%
|
Adjustment Factor for Lack of Test Reliability and Validity (A)
|
2
|
Expected Lift That Justifies Testing New Version (JT = A * D)
|
3.6%
|
Will a new version of the advertisement give sufficient lift to make creating it worthwhile? The challenge in answering this is that we don’t know any version’s effectiveness before it has been created and tested; i.e., after we have already spent the money developing it.
The expected value of the new version is the chance of getting a new version of a certain quality multiplied by the outcome when we get a new version of that quality. There are two broad outcomes. The new version is equal or of worse quality than our best current advertisement. When this is the case the new version has no value at all. When the new version is of higher quality then lift is the additional value coming from the new version.
We make the assumption that each version of an advertisement has a quality score between one and 10 and a quality of 10 is twice as effective—in some way defined by the firm—as 5, etc. For our example we assume that there is a uniform distribution of quality for versions of the advertisement. If we have ten levels of quality (1–10) each is equally likely. (Note one of the advantages of this model is that you can specify any distribution of advertisement quality that you wish, just change the distribution in
Table 12.12
.) The lift from the new version is the quality of the new version minus quality of the current version divided by the quality of the current version. We know the quality of our current version—what we will deploy if we end testing—and so can create an expected lift from a new version. If this exceeds the expected lift needed to justify commissioning a new version,
Table 12.11
, we should do so, otherwise we should stop testing and deploy our current version.
Table 12.12
shows us that with a current version quality 8 we should continue testing, but it is a very close call. (To read
Table 12.12
, note that the current advertisement scores an 8. Looking across shows us that a new version that scores 8 or less will not provide an effectiveness lift. A new advertisement that generates a score of 9 will improve life by 12.5%.)
Table 12.12 Expected lift from new advertisement
**Lift From New Version = IF(QNew 7 QCur, (QNew-QCur)/QCur, 0)
Whether you should continue creating versions depends on the quality of your current version.
Table 12.13
shows the expected lift at each quality of current version. When you only have a low quality current version, developing new copy has a huge expected value as you are likely to get copy that will substantially increase the
effectiveness of your media spending. You should stop testing if you have a current version of quality 9—the expected benefits of continuing to commission versions are less than the cost of doing so. If you already have a version of quality 10 there is no possible benefit to further testing.
This model contains a number of assumptions, such as that the version of an advertisement can be neatly scored and the quality translates predictably into relative sales results. Furthermore, we also assume each new copy comes from the same distribution; i.e., this somewhat implies the agency isn’t giving you its best ideas first. This need not be true. You might get progressively worse ideas each time you go to the agency. Despite the challenges in getting a perfect model, we think the method is most valuable as a general illustration of how to think about the value of information in a management decision context that includes uncertainty, financial consequences and ability of imperfect data to inform a decision.
Table 12.13 Expected lift from new version given current version score
Quality current version (QCur)
|
Expected lift from new version (ENew)
|
Decision (Test if ENew>JT)
|
1
|
450.0%
|
Test As > 3.6%
|
2
|
180.0%
|
Test As > 3.6%
|
3
|
93.3%
|
Test As > 3.6%
|
4
|
52.5%
|
Test As > 3.6%
|
5
|
30.0%
|
Test As > 3.6%
|
6
|
16.7%
|
Test As > 3.6%
|
7
|
8.6%
|
Test As > 3.6%
|
8
|
3.75%
|
Test As > 3.6%
|
9
|
1.1%
|
Don’t Test as < 3.6%
|
10
|
0.0%
|
Don’t Test as <
3.6%
|
This model contains a number of assumptions, such as that the version of an advertisement can be neatly scored and the quality translates predictably into relative sales results. Furthermore, we also assume each new copy comes from the same distribution; i.e., this somewhat implies the agency isn’t giving you its best ideas first. This need not be true. You might get progressively worse ideas each time you go to the agency. Despite the challenges in getting a perfect model, we think the method is most valuable as a general illustration of how to think about the value of information in a management decision context that includes uncertainty, financial consequences and ability of imperfect data to inform a decision.
References and suggested further reading
Ambler, Tim, Flora Kokkinaki, and Stefano Puntonni. (2004). “Assessing Marketing Performance: Reason for Metric Selection,” Journal of Marketing Management,
20, pp. 475–498.
McGovern, Gail, David Court, John A. Quelch, and Blair Crawford. (2004). “Bringing Customers into the Boardroom,” Harvard Business Review,
November, pp. 1–10.
Meyer, C. (1994). “How the Right Measures Help Teams Excel,” Harvard Business Review.
72(3), 95.
O’Conner, Fina Colarelli, Thomas R. Willemain, and James MacLachlan. (1996). “The Value of Competition among Agencies in Developing Ad Campaigns: Revisiting Gross’s Model,” Journal of Advertising
25 (1), p. 51–62.