12

FILLING IN THE BLANKS

WHAT DATA?

Once you have a framework, the next step is to define the data needed to analyze the factors in the framework and test the hypothesis. The most efficient way to do so is to work backward—first decide on the analysis based on the logic tree (how you are going to use the data) and then collect the data. Using the logic tree for writing a business book presented in Figure 11.6, a worksheet to list out the data needed for one of the questions can look like the one in Figure 12.1.

In this worksheet example:

Figure 12.1 Assessing Data Needs

c12_image001.jpg

Both qualitative and quantitative data are important for analysis—and not easy to collect. Of the two, quantitative data is often seen as more valuable.

One of my HBs professors liked to repeat this quote from Lord Kelvin (1883):1

“When you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind.”

Quantitative data is especially useful because such measures can be synthesized. For example, current market size and growth rate can be multiplied together to estimate future market size. Qualitative comments such as “big market” and “fast growth” cannot be multiplied in the same way.

In addition, quantified measures allow unbiased and unambiguous comparisons of strategic options, scenarios, and trade-offs as well as tangible estimation of resource needs, potential payoffs, and risks. This is a key point as strategy is always about allocation of limited resources—which industry to put your resources (money, time, staff)—and where to invest to develop competitive advantages.

The keys to obtaining effective qualitative and quantitative data are access to a wide range of data sources, ability to deduce unavailable data, discipline to check the data, and pragmatism in using the data.

WHERE FROM?

Starting from the sources easiest to access, these are the key data sources:

The Internet

Almost everyone knows the Internet is a powerful source of data. For strategy study, you can find the following sources of data online:

Associations and Bureaus

Trade associations, semi-government bodies, and government bureaus collect statistics. Such data is often published and for sale at very reasonable prices. Besides the officially published data, some associations may also be willing, usually for a fee, to provide tailored reports based on both the published data and on the raw data the report was drawn from. This could be useful if the published data do not answer your “logic tree” question but you can see how the raw data behind the report can be analyzed to provide the answer.

Interviews

People with knowledge of the industry—industry veterans, ex-employees, suppliers, buyers, investment bank analysts, and so on—are often willing to talk about what they know. This is one of the fastest and most direct ways to get firsthand information.

When I first started out as a consultant, my job consisted mainly of finding telephone numbers from directories and cold-calling senior executives to set up telephone or face-to-face interviews for my superiors. I was very skeptical when I was first asked to do this. I thought, “Why would people agree to talk to me?” but I quickly found out it was easier than I thought. Of course, working for a well-known consulting firm helped as many potential interviewees recognized the name of the firm when I introduced myself over the phone. You would be surprised that many people like talking about themselves and their business, especially if the topic is not confidential or if they see a potential for business in the future. Over the years, I have set up and conducted many interviews. A few best practices I have learned are listed in Appendix B.

Analysts’ Reports

Investment banks such as Goldman Sachs hire stock analysts to research and evaluate key industries and key companies in those industries. The analysts then write reports on their findings. One of the purposes of these reports is to enable the bank’s private bankers and outside brokers to advise their clients in stock trading. These reports are valuable as analysts often have access to company senior executives who understand that these reports could help stock trading of their companies. Unfortunately, these reports are not freely available. The easiest way to obtain them is via friends or contacts at these banks. This is related to the chapter on social networking. If you do not know anyone, you can try to purchase some of these reports through the Internet either by e-mailing the banks or by searching for some finance Web sites that sell such research reports.

Professional Databases

Professional databases such as Bloomberg and Lexis-Nexis are highly efficient as they give you access to a large selection of journal and newspaper articles using keyword searches. They are available by subscription. Some companies have research departments that subscribe to these services. Otherwise, select libraries, especially business and university libraries, may provide access. Some of these services are also available for individual subscription. You can go onto Google to search for the local sales representatives of these databases. You can contact them to find out about individual subscription or ask for a list of libraries that carry these databases.

Benchmarking

Benchmarking—identifying and studying comparables that can be used as indicators of possibilities or as a comparison to stimulate insights—is a very useful data source. For example, in 2007, a client of mine wanted to evaluate whether to spend R&D dollars on a revolutionary consumer electronics technology. In trying to analyze the Porter Five Forces, we used the Walkman, digital camera, and iPod as benchmarks to understand the product life cycle, speed of copycats, and the like for revolutionary consumer electronics.

In 2000, I was involved in helping a state-owned bank in mainland China determine its strategy in anticipation of the opening up of the financial market to foreign competition. A key component of the study was to analyze the market development and evaluate the successes and failures of state-owned banks in other countries such as Japan that have undergone similar deregulation.

Information on benchmark targets can be found using the kind of sources discussed in connection with other types of data in this section.

Sampling

Sampling involves looking at a relatively few instances to provide an indication of the whole population. Statisticians often talk about “statistical significance,” which means you need a certain quantity such as number of surveys before the data collected can be trusted to have an acceptable degree of accuracy. In business, due to time and other resource constraints, it is often difficult if not impossible to get enough data for rigorous significance. But even small samples with no statistical significance could be invaluable as an estimate. For example, a client of mine, a multinational beverage company, once wanted to enter the bottled water business in Thailand. We needed to understand the sales volume of the key competitors, but no data was readily available. So a team of us sat in a rented car across the street from the warehouses of the two biggest competitors and counted the number of truckloads leaving the warehouse every day for one week each. We took pictures of the trucks and found out their capacity from the truck dealers. Based on these daily deliveries, we did some seasonal adjustments and estimated monthly and annual sales volumes.

Another client, an oil refinery in Malaysia, wanted to improve the efficiency of its maintenance process. To understand the workload of the maintenance department, I followed a maintenance technician around for a week and documented all the work he did, how he did it, and how long it took. Then using this as a sample week, I redesigned the process and then calculated the time savings of the new process compared to the week sampled.

THE UNGETTABLES

Very often, critical data do not exist in readily available form and can’t be found in a useful amount of time. The ability to deduce these critical numbers is paramount to strategy. These are the techniques taught at HBS that I’ve found indispensable to any strategist:

LOGICAL DEDUCTION USING LIMITED DATA

In most HBS case studies, students get some data either within the text or in table format. But since the cases are based on real-life situations, they reflect the reality that the data available are most often not sufficient to analyze the issues in the case. The key technique needed is to know what you want and then use the numbers given to reach a reasonable estimate. As a simple example, say you need the market size of television sets in city X. In the case study, you are told that the population is 100 million in city X, and that the average life of a television is three years. Assuming no other quantitative data is available from the case write-up, here’s one way to estimate the market size:

1. Since city X is in a developed country, assume the average household size is similar to the average U.S. household, around 2.5. This means 100 million people will make 40 million households.
2. Since most families even at the lowest income levels have at least one television, assume each household has 1.5 televisions. Note the total of 40 million × 1.5 or 60 million televisions out there.
3. Assuming their lifespan is three years; about a third ofthe televisions will need to be replaced every year. This means a market of approximately 40 million × 1.5 televisions × 1/3, or 20 million.
4. Remember that the resulting number, 20 million, is just a ballpark figure and not hard data. Its sensitivity and validity need to be checked using techniques to be discussed in the next sections.

This deductive, logical estimation technique to quantify key parameters based on existing data is a critical tool applicable to many HBS cases and in real-life strategic planning. In fact, this technique is so critical that many consulting firms like to include it as an interview question.

When I was responsible for Greater China recruiting for BCG, for example, my favorite interview question to MBA graduates was “How many Toyotas do you think there are in Hong Kong?” The interviewees had no access to a computer or even paper and pencil. What I was looking for was the deductive, logical estimation technique, not the exact number. I did not even know the answer myself. I expected something like this: “There are seven million people in Hong Kong. Let’s say four people a household. This makes about 1.8 million families. Toyotas are for middle-income families. Let's say about 1/5 of the families are middle income. That makes about 400,000 middle-end cars. Of this market, there are other brands like Honda, Mazda, and Suzuki. None of them seems to have a bigger share than the others. So maybe the number of Toyotas is about 100,000.” It does not matter if any of these assumptions are right. The key is the ability to think logically and the comfort with numbers and estimates. Once the logic and comfort are there, researching assumptions and testing their validity is not difficult.

It is useful to reference the two very simple frameworks sketched in Figure 11.1 to help you think of different systematic approaches when undertaking logical deduction:

The Toyota example is top down, as it starts from macro data population and households and then narrows down. A bottom up approach to the Toyota example will be to start from the number of dealerships, then estimate sales per dealership per month, multiply by 12 months to get an annual figure. Assuming the life of a Toyota is about five years, then multiply by five.

The estimate of Toyotas using demographics is demand-side deduction. Estimate of number of Toyotas based on number of dealerships is supply-side deduction.

Compound Annual Growth Rate

Compound Annual Growth Rate (or CAGR; pronounced KAY-ger) is very useful in logical deduction. CAGR is a critical quantitative concept because it has a very wide range of applications. It was not explicitly taught at HBS—the faculty assumed everyone knew it already!

The concept of CAGR is similar to the idea of compound interest. It is the average annual growth rate when compounding is taken into account. For example, if a market grows from $1,000 to $5,000 in five years, the CAGR is 38 percent a year because $1,000 × 1.38 × 1.38 × 1.38 × 1.38 × 1.38 = $5,000. (The growth rate is not 500 percent over five years divided by five years.)

The formula is similar to compound interest. The following explains how the formula of CAGR is derived:

c12_image003a.jpg

where FV is the future or ending value, PV is the present or starting value, and n is the number of years between PV and FV. This formula is best explained by examples:

Example one: If the widget market was worth $300 million in 2000 and in 2007 it is $400 million, then n = 2007 – 2000 = 7. CAGR is (400/300)(1/7) – 1 = 4%

Example two: If the historical average CAGR for the market in the last three years is 15 percent per year, the forecast market size in five years if it continues to grow at the same rate would be

Future market size = Current market size x (1 + 15 percent)5

Example three: Sales of company X doubled in the last four years. Doubled means (FV/PV) = 2. Hence

CAGR = 2(1/4) – 1 = roughly 20 percent

CAGR is extremely useful in the estimation of key parameters. For example:

To look smart, many MBA graduates use a shortcut calculation of CAGR. I call it the Rule of 75%. It goes like this:

CAGR roughly equals 0.75 divided by the number of years the factor in question takes to double.

Using this rule on example three, CAGR roughly equals

0.75/4 = 0.19 (or roughly 20 percent)

Appendix C shows the accuracy of this rule. This rule is useful to quickly work out CAGR in your head (or if you are trying to show off your skills in a discussion).

Ballpark Interview Technique

When you try to do your research through interviewing, you will notice that many interviewees are reluctant to give quantified data. This greatly affects the value of the interviews. Without quantification, you really do not know what people mean when they give qualitative data like “fast”or “slow”,“big” or “small.”Usually it is not because they do not want to quantify but because they do not have the data and feel they don’t wish to be held responsible for giving the wrong number. What they do not understand is even a rough estimate is very useful as a start for quantification for strategy planning. “Ballpark technique,” as I call it, often works to help to get at least a rough estimate from interviewees. It involves giving the interviewee a few options to choose from. These options are sufficiently different from each other that they make the choice relatively easy. Once a “ballpark” option is chosen, further narrowing down can be done until the interviewee cannot provide further information. To demonstrate the technique, here is a possible conversation between an interviewer and an interviewee:

INTERVIEWER TRYING TO GET DATA: How fast do you think the refrigerator market has been growing in the last few years?

INTERVIEWEE: I don’t know. I don’t have any data.

INTERVIEWER: Well, do you think it is closer to 0.5 percent, five percent, 15 percent, or over 25 percent?

INTERVIEWEE: It has been slow but not completely no growth, probably closer to five percent than 15 percent.

INTERVIEWER: Do you think it is more like three to six percent or more like six to nine percent.

INTERVIEWEE: Don’t know but probably lower than higher.

INTERVIEWER: So it is roughly three to six percent from your estimate?

INTERVIEWEE: Maybe.

Like the results of logical deduction, these ballpark figures need to be checked and rechecked. But at least this approach gives you a starting point for the necessary estimates.

Scenarios

If a key parameter is very difficult (or impossible) to estimate but yet very important to your strategy plan, a tool you can use is scenarios—asking “what if?” The idea is to define a few scenarios for your target parameter and then compare and evaluate the implications of the outcomes from various scenarios. Using the television example, say the market growth rate is very difficult to estimate. You have looked at all research reports and interviewed many people but you are not getting any reasonable ballpark estimates. So you decide to define three scenarios and then investigate the implications of each:

Scenarios on Market CAGR from Now Until 2015

Pessimistic: zero growth

Average performance: five percent per year

Optimistic: 10 percent per year

A few points illustrated by this example:

Minimum Threshold

If scenario analysis still makes it too difficult to estimate a parameter due to the huge range of possibilities, defining the minimum threshold could be a possible tool. Say you find that x percent per year market growth is the minimum needed to make an investment attractive. Then the question is whether you believe this market growth is possible.

CONSISTENCY AND TRIANGULATION

Naturally, the accuracy and validity of data estimated by deduction and data from sources such as interviews must be checked. Even data obtained from seemingly reliable sources should be checked. A few years ago, I used some population data straight from a China provincial government statistics book. The table I set up looked something like the one shown in Table 12.1 (though the reported data have been disguised, they follow the same pattern as the original).

I was rushing to do my analysis so I did not spend time to think about the data. I just copied the data. In the middle of my presentation, my client pointed out that the population of the cities and non-city areas within the province added up to more than the total for the province! Needless to say, the mistake affected the credibility of the whole presentation.

Table 12.1 Population Estimates

Place Population 2003 (Millions)
City 1 in this province 10.2
City 2 in this province 4.4
City 3 in this province 3.3
Non-city areas within this province 30.4
Total for this province 45.4

Therefore, it cannot be overemphasized: check your data whenever possible. Two of the most direct ways for ensuring data accuracy are checking it for consistency and triangulating the item in question.

Consistency

If a certain parameter is important to the strategy, then multiple sources of data or deduction tools should be used. The results should be compared against each other. Since data from interviews and data deduced by tools are often rough ballpark figures, you can’t expect the figures to be identical, even if they are all consistent and valid. It’s only necessary for them to be “within each other’s range.” For example, if CAGR resulted in an estimate of market size of $2 billion and the top-down tool resulted in $1.8 billion, then these rough estimates can possibly be considered consistent. However, if you got $2 billion in the former and $1 billion in the latter, then you should revisit your estimates. There is no fixed definition of what “within range” is. It depends on the accuracy you need for your analysis. In most cases, a 10 to 15 percent difference could be acceptable for rough estimates. A difference of 30 percent or more is not so acceptable. When two or more estimates are “within range” but not exactly the same, a usual practice is either to take the average or use a range with a minimum and a maximum.

Triangulation

Data for different parameters must triangulate: They must make sense when they are put together. In the population data example shown in Table 12.1, the data for individual cities and the total for the province do not triangulate—they do not make sense when put together. Another example: suppose you are estimating market shares of major competitors. In that case, the sum of the percentage shares of major competitors should not exceed 100. If you have sales estimates for various competitors, the total should not exceed the estimated total market. If you have historical 2006 market sales and estimates for 2007 sales, 2007 sales should be reasonably larger than 2006 if it is a growing market. And so on.

Table 12.2 Reported Percentage within Each Key Market

c12_image002.jpg

Besides being very effective for testing validity when you attempt different ways to estimate a key parameter, triangulation is also extremely useful when you have to assess other people’s estimates quickly. It is surprising that even professional consultants often publish reports or give presentations that contain data that do not triangulate. Table 12.2 shows an example I recently saw in a professional presentation by a real estate consultant from Canada.

Do you see the problem in the table? The Australia market adds up to 110 percent! Sometimes this kind of discrepancy is a typographical error, but sometimes it is a real estimation problem. Once you have the concept of triangulation ingrained in you, you will be able to pick these mistakes out quickly. You may be able to help a company avoid a wrong decision based on a mistake in a critical parameter. Even if the error is not in a critical parameter, you can look smart and alert in front of your superiors or clients (this is one of the key skills that often make MBAs look smarter than they are). But you must be careful not to point out the mistake in a way that will embarrass the creator of the estimate. This is related to social networking—it is always better to make friends rather than enemies.

The more important the data, the more checking needs to be done. One of the key measures of importance is sensitivity. Sensitivity here means how much the data affect the decision you are trying to make. For example, if you are looking at a business with high fixed costs, then revenue projections are very important since once fixed cost is covered, every dollar of revenue largely goes to the net profit with very little lost due to variable cost.

LAW OF ACCURACY

Estimates often require further calculation: you need to add, multiply, subtract, or divide them to get the results you’re after. For example, if you estimate a market to be around $25 million and market share of a certain company is about one third, using the calculator gets you something like this:

$25,000,000 × 1/3 = $8, 333, 333.33

Some people are tempted to report this kind of number as the estimated company sales. But publishing it as it stands would violate a very important mathematical rule: the “Law of Accuracy” (see below).

The “Law of Accuracy”

When combining estimates with different numbers of significant figures, the accuracy of the result can be no greater than the least accurate of the estimates. This means when estimates are added, subtracted, multiplied, or divided, the result should not have more significant figures than the original estimates.

The number of significant figures is the number of digits that have some degree of accuracy, plus the last digit. Most MBAs are not mathematicians and tend not get too technical or exact in the definition of significant figures (details such as when zero is counted as a significant figure and when it is not; which digits have some degree of accuracy and which do not; and what it really means to say “plus the last digit,” and so on).

The important point is to recognize that often an estimate could be derived so roughly that it has only one or two significant figures. For example, an eight-digit derived estimate like $25 million is apt to mean only that the number is somewhere between $20 million and $30 million. As a result, combinations of estimated data must not include excessive numbers of significant figures or decimal places. For example, when you encounter a number such as the $8,333,333 that came up in the calculation based on the $25 million market estimate, the appropriate interpretation is often “somewhere between $8 million and $9 million, but probably on the low side,” so it can be rounded to $8 million or $8.5 million. If further combinations (addition, multiplication, and so on), especially on a spreadsheet, are necessary, then, for simplicity, most people would choose to continue to carry the $8,333,333. This is OK as long as the final number presented is rounded appropriately. For example, you might want to know what the total would look like if the sales of $8,333,333 grew 40 percent. You could calculate $8, 333, 333 × 1.4 (that is, 140 percent) on your spreadsheet or calculator ($11,666,666) and then round the output to $12 million or a range for use in presentations and decision making.

KEEP YOUR SENSE OF PERSPECTIVE

As you use these detailed tools for estimating and checking data to try to get the data for strategy, it is critical to keep your perspective on what this all means.

So What?

I was watching television one night, and this conversation between a grown man and an eight-year-old girl really amused me:

MAN (A FRIEND OF THE CHILD’S FATHER): Hello, Daisy. It’s nice to meet you. You look just like your mom.

CHILD: So what?

MAN: Oh . . . I mean you are as pretty as your mom.

CHILD: So what?

I believe this child will do well in data analysis.

“So what?” is the paramount question in data analysis. Data is a means, not an end. The key skill is to be able to ask “so what” constantly, from the time you are planning your data collection to the time you are applying the data. You found out the biggest competitor has a 50 percent market share. So what? What does it mean to your hypothesis? Would your strategy be different if the share was 40 percent or 60 percent? Do you need to verify this data or is this ballpark estimate good enough? In fact, at BCG, we are not allowed to write any powerpoint presentation slides or draw any graphs without a “so what” as the title or at the bottom of the slide. The idea behind this rule is to force the thinking on “so what” for every single analysis.

An example from my consulting work:

A client of mine, a multinational beverage company, wanted to enter the bottled water business in Thailand. If the market turned out to be attractive, then the two options for entry were “greenfield” (start building from scratch) or “acquire a small brand and grow it.” The first phase of the project was to study the Porter Five Forces to confirm marketing attractiveness. The second phase was to compare the cost of the two options. One late night during the second phase, I finally finished assessing the cost of the greenfield option and the cost of acquiring the small brand (not including growing the brand). I was about to start evaluating the cost of growing the small brand when I realized that the estimated acquisition cost for the small brand was already more than triple the cost of greenfield. The difference was sufficiently big to force the conclusion that greenfield would be the cheaper option. Then it dawned on me that it was irrelevant to estimate the cost of growing the small brand because it would not change the conclusion. This realization saved the team and me a lot of time and effort.

It is not always easy to tell “so what” right after you have collected (and verified) the data. For many kinds of data, benchmarking comparables could be a useful tool. For example, suppose you found out a company in the IT industry in Malaysia has a growth rate of 15 percent per year. Is this a strong or weak growth rate? In such cases, it is useful to compare this to benchmarks, such as growth of the IT industry in Malaysia, competitors in Malaysia and overseas, the company’s own history, or other companies in your portfolio.

Sometimes, absolute numbers need to be converted into ratios for comparison. For example, a reader once asked me: “I want to invest in this company B. It has a debt of US$50 million. Is that high or low?” It is difficult to tell whether this is high or low because it depends on factors including the company’s current and future ability to repay the debt. To assess the level, the debt can be converted into ratios such as debt/equity or “times interest earned.”2 Then the ratios can be compared to benchmarks such as industry average, competitors’ ratios, the company’s own history, bank requirements, or rating agency requirements.3

As another example, suppose you see that a company has $100 million net profit. To assess if this is sizable, you can compare it with the net profits of competitors or other companies of interest to you. You can also calculate market share or net margin (net profit divided by revenues) to assess the company’s market importance and its ability to turn each dollar of revenue into net profit.

Decisions on what ratio to quantify and benchmark for each strategic study will depend on the hypothesis and framework you have selected.

Stop Fooling Yourself

As a trained engineer, I used to think a good strategist should be able to give a definite answer with confidence: “This is what the data says and this is what we should do.” No one told me otherwise until I overheard a senior consulting partner saying, “Vic (a starting consultant like me) is great. He knows to use the word seem and the phrase the data seem to show that even on day one. He has the right perspective on data. I can’t say the same about Emily.” A light bulb went on in my head. I realized that instead of pretending that the data is perfect and will give a definite answer, I should acknowledge that the data is imperfect and is most often only ballpark and directional. Using “seem” and “seem to indicate,” both in thought and in discussion, provides a constant reminder of reality and also stimulates more brainstorming, inviting constructive challenges from others that can increase the validity of the strategy.

Don’t Work Your Data Too Hard

If you massage the data enough, they will say what you want. This third point is very much related to the two that precede it. As discussed, you have to use data to deduce “so what.” But the data is imperfect—it includes all kinds of estimates. This will mean that very often, data can be deliberately manipulated to drive a certain direction or decision. Think of an eight-ounce glass containing four ounces of water: it can be described as “half full” or “half empty,” which can lead to very different “so what” conclusions. Or say the glass is 75 percent full of water. It can be rounded down to 50 percent full or rounded up to 100 percent full, which again can lead to very different answers to “so what.” Hence, it is important when you are analyzing data or when you are presented with analysis to keep this maxim in mind: “If you massage your data enough, they will say anything you want.”

Notes

1. Lord Kelvin (William Thompson) was an Irish mathematical physicist and engineer widely known for developing the Kelvin scale of absolute temperature measurement.

2. These are just examples of ratios used in accounting and finance. Debt/equity = total liabilities of the company divided by total shareholders equity. It measures what part of a company’s resources is obtained from borrowing and what part is from owners’ investments. The calculation of times interest earned = (pretax income + interest expense)/interest expense. It measures a company’s ability to make the interest payment.

3. Rating agencies such as Standard & Poor’s Corporation or Moody’s rate bonds issued by companies on their creditworthiness, such as AAA, AA, A . . . B and so on. Their ratings are guided by lists of objective requirements including the range or thresholds for various debt ratios.