Estimation of the Demand Function

5.1 INTRODUCTION

This chapter examines the methods by which we may obtain the demand data for real- world decision problems. In preceding chapters we presumed to know the impact on the quantity demanded of changes in the price level, in consumers’ incomes, in prices of related products, and so forth. It is obvious, however, that this information may not be readily available. Given the prior expectation that the value of the information will exceed the search costs of obtaining that information, the decision maker must generate the data using a variety of techniques from market research and statistical analysis.

Note: We distinguish between demand estimation and demand forecasting on the basis of the period for which demand data are sought. Demand estimation will be taken to mean the process of finding current values for the coefficients in the demand function for a particular product. Demand forecasting will be taken to mean the process of finding values for demand in future time periods. Current values are necessary to evaluate the optimality of current pricing and promotional policies and in order to make day-to-day decisions in these strategy areas. Future values are necessary for planning production, inventories, new-product development, investment, and other situations where the decision to be made has impacts over a prolonged period of time. This chapter is confined to the issue of demand estimation. Demand forecasting is treated in the appendix to the chapter. Although we shall treat estimation and forecasting separately, demand estimation often forms the basis for demand forecasting. We shall examine the major methods of both estimation and forecasting and indicate the major problems and pitfalls one may expect to encounter.

In Chapter 4 we considered the demand function in the form where demand (or sales) is expressed as a function of the variables price, advertising, consumer incomes,

eX-2— ° —■'-^—4, ^ -S' <a j. J

Picture #35

consumer tastes and preferences, and whatever other variables are thought to be important in determining the demand for a particular product.

Q = a + foP + (3 2 A + for + for + . . . + 0„N (5-1)

The (3 coefficients represent the amount by which sales will be increased (or decreased) following a one-unit change in the value of each of the variables. The present level of each of the variables is known or can be found with some investigation. It is the coefficients of these variables that are the mystery and that are important to us for decision making. That is, we wish to know what will happen to the sales level if we change a particular independent variable by a certain amount, holding all other variables constant. Stated alternatively, we wish to know whether or not a change in the value of any of these variables from their present levels would have a beneficial impact on the firm’s profits, or net worth.

Of course, not all of these independent variables are controllable, in the sense that we have the ability to adjust their level. The controllable variables are price, promotional efforts, product design, and thejDlace of sale, which may be known to you as the “four P’s” of marketing. The uncontrollable variables in the demand function are those that change independently of the firm’s efforts, and they include such variables as consumer incomes, taste and preference patterns, the actions of competitors, population, weather, and political, sporting, and social events or happenings. However, even if the firm is unable to influence these variables, knowledge of the probable value of the coefficients is useful, since it reduces the uncertainty of the impact of changes in those variables. Given an expectation of the effect of increased consumer incomes, for example, the firm is able to plan more effectively its production, inventories, and new- product development, in view of expected changes in the affluence of consumers.

Direct versus Indirect Methods of Demand Estimation

Methods of estimating the values of these beta coefficients may be classified as either direct or indirect. Direct methods are those that directly involve the consumer, and they include interviews and surveys, simulated market situations, and controlled market experiments. Thus consumers are either asked what their reactions would be to a particular change in a determining variable, or they are observed when actually reacting to a particular change. Indirect demand estimation proceeds on the basis of data that have been collected and attempts to find statistical associations between the dependent and the independent variables. The techniques of simple correlation and multiple-regression analysis are employed to find these relationships. Direct methods of demand estimation are covered in detail in marketing research courses, while indirect methods are examined in quantitative methods courses. In this chapter we confine ourselves to the application of these methods to the problem of estimating the parameters of the demand function, and we refer the reader to the sources cited at the end of the chapter for more detailed treatments and other applications.

INTERVIEWS, SURVEYS, AND EXPERIMENTS

5.2

Interviews and Surveys

The most direct method of demand estimation is simply to ask buyers or potential buyers how much more or how much less they would purchase of a particular product if its price (or advertising, or one of the other independent variables) were varied by a certain amount. “Focus groups” may be assembled for discussions, or questionnaires may be administered to a sample of buyers. Although seemingly simple, this approach is fraught with difficulties. The first problem is that the individuals interviewed or surveyed must represent the market as a whole so that the results will not be biased. Thus a sufficiently large sample, generated by random procedures, must be interviewed in order to form a reasonable estimate of the market’s reaction to a proposed change.

A second problem is interviewer bias, defined as the distortion of the interviewee’s response caused by the presence of the interviewer. Where the true answer would make the respondent feel a little stupid, imply gluttony or drug dependence, expose the person as a tax cheat, or involve any other potential embarrassment, the respondent may give an incorrect answer to avoid the embarrassing moment. Interviewer bias will occur in personal interviews, telephone interviews, and even mailed-in questionnaires (because someone reads them). Telephone and mailed questionnaires avoid eye contact and lessen the potential embarrassment to a large extent, but at the same time the repondents may be less well chosen and their responses less well considered, as compared to personal interviews. Anonymous written or telephone responses may be more accurate in terms of interviewer bias, but less accurate in terms of deep thought given to the responses.

Third, there is the gap between intentions and action. The consumer may have truly intended to buy the product at the time of interview, but by the time the marketing strategy is implemented something may have intervened to change the consumer’s mind. Other things may not stay equal long enough for the action to have the anticipated result. Finally, the responses may be unreliable if any question is confusing or misinterpreted or if it involves things beyond the realm of the respondent’s imagination. New products, for example, when described briefly for the first time, cannot easily be pictured as part of the consumer’s lifestyle. Early estimates of personal computer sales underestimated the dramatic growth of business demand for them in the early 1980s. Later predictions were wildly optimistic about the continuing penetration of the business and household markets, as evidenced by the excess inventories held by the major manufacturers by the mid-1980s (despite falling prices).

A great deal of research has gone into the problems of questionnaire formulation in order to derive reliable results from interviews, surveys, and focus groups. Rather than asking questions directly, researchers may derive the answers to a specific question from the respondent’s answer to a number of other questions. Reliability of responses to specific questions may be checked by asking the same questions in a differ-

ent form at a later point during the interview or on the questionnaire. 1 The form of the question can influence the nature of the results: open-ended questions allow the consumer to express in his or her own words what the response may be, while structured questions, such as multiple-choice questions where the respondent must use one of four or five specific responses, suggest an answer to the consumer and may bias the results toward something the researcher expected to find. The choice of words is an important consideration, since nuances may be involved and some words have different meanings to different people. The questions must be sequenced in a way that creates and holds the subject’s interest, provokes accurate responses, and does not create an emotional reaction that may influence subsequent answers or cause the respondent to refuse to continue. 2

In summary, considerable care and thought must be included in the construction of the questionnaire, and reasoned analysis must be involved in interpreting the results of the survey. Let us consider the results of a particular market survey.

Example: The Sylvain Leather Products Company intends to introduce a new wallet and wishes to estimate the demand curve for the new wallet. Members of the market research department have conducted a questionnaire survey of one thousand people interviewed while shopping for goods of a similar nature. The interviewees were each asked to choose one of six responses as to whether they would actually purchase the new wallet at each of five price levels. The responses were (a) definitely no; (b) not likely; (c) perhaps, maybe; (d) quite likely; (e) very likely; and (f) definitely yes. The number of people responding in each category at each price level is shown in Table 5-1. The analysts have determined that the probabilities of actually buying the product for each of the six responses are 0.0 for response (a); 0.2 for response (b); 0.4 for response (c); 0.6 for response (d); 0.8 for response (e); and 1.0 for response (f).

Table captionTABLE 5-1. Sylvain Leather Products Company: Summary of Questionnaire Responses

Price

($)

NUMBER OF PEOPLE RESPONDING AS

Expected

(a)

(b)

(c)

(d)

(e)

(f)

Quantity

9

500

300

125

50

25

0

160

8

300

225

175

150

100

50

335

7

100

150

250

250

150

100

500

6

50

100

100

300

250

200

640

5

0

25

50

225

300

400

800

Note 1

Note 2

From these data, we can find the expected value of quantity demanded at each price level. At a price of $9, for example, the expected value of quantity demanded is the sum of the expected value of sales to each group of respondents. That is,

E(Q) = 500 (0.0) + 300 (0.2) + 125 (0.4) + 50 (0.6)

+ 25 (0.8) + 0 (1.0) = 160 units

Proceeding similarly, we can calculate the expected quantity demanded at prices of $8, $7, $6, and $5, as shown in Table 5-1. Plotting these price-quantity coordinates on a graph, as in Figure 5-1, we see that they trace out a demand curve which intercepts the price axis at approximately $10.00, and which has a slope of approximately — 5/800, or —0.00625. This quick estimate of the slope is obtained by noting that as price falls from $10 to $5 (rise = —5), quantity demanded increases from zero to 800 units (run = 800). This relation indicates that for quantity demanded to increase by 100 units, price must be reduced by 62.5 cents.

Thus the estimate of the demand curve is P x = 10.00 — 0.00625Q*; and from this the firm can easily find MR* = 10.00 — 0.0125Q*, since the marginal revenue curve has the same intercept and twice the slope as the demand curve.

Note: This estimate of the demand curve and the marginal revenue curve depends upon the sample of shoppers being a random sample that is representative of people who are in the market for leather wallets. It also presumes that the responses were free from interviewer bias and that their intentions would actually culminate in purchases to the extent indicated by the probabilities. We also require ceteris paribus with regard to consumer incomes, tastes, perceptions, prices of rival products, and so forth. In particular, if any of these factors change between the time when the data were col-

FIGURE 5-1. Sylvain Leather Products Estimated Demand and Marginal Revenue Curves

lected and analyzed and the time when the wallet is actually offered for sale, we should expect the above specification of the demand curve to be inaccurate because of a shift in the actual demand curve.

Simulated Market Situations

Another means of finding out what consumers would do in response to changes in price or promotion efforts is to construct an artificial market situation and observe the behavior of selected participants. These so-called consumer clinics often involve giving the participants a certain sum of money and asking them to spend this money in an artificial store environment. Different groups of participants may be faced with different price structures among competing products and differing promotional displays. If the participants are carefully selected to be representative of the market for these products, we may—after observing their reactions to price changes of different magnitudes and to variations in promotional efforts—conclude that the entire market would respond in the same way.

Results of such simulated-market test situations must be viewed carefully, however. Participants may spend someone else’s money differently from the way they would spend their own money, a phenomenon amply demonstrated by business executives’ use of expense accounts. Alternatively, participants may feel that they are expected to choose a particular product when its price is reduced in order to demonstrate that they are thrifty and responsible shoppers. Consumer clinics are likely to be an expensive method of obtaining data, however, since there is a considerable setup cost, participants must be provided with the products they select, and the process is relatively time consuming. Given these factors, it is likely that the samples involved will be quite small, and hence the results may not be representative of the entire market’s reaction to the pricing and promotional changes. Nevertheless, such experiments may provide useful insight into the price awareness and consciousness of buyers and into their general reaction to changes in specific promotional variables.

Example: The Brazilian Gold Coffee Company wished to ascertain the responsiveness of consumers to changes in the price of its coffee. Six groups of one hundred shoppers each were organized for a simulated market experiment. The membership of the groups was chosen such that the socioeconomic characteristics of the groups were roughly equal and similar to the market in total. Within one afternoon each group was allowed thirty minutes to shop in a simulated supermarket. Each participant was given $30 in “play money” to purchase any items on display in the simulated supermarket. Brazilian Gold coffee was displayed prominently alongside the best-selling brand of coffee. For each of the six groups, Brazilian Gold was priced at different levels while the prices of all other products were held constant. The price levels and the resultant quantities demanded are shown in Table 5-2.

In Figure 5-2 we plot the price-quantity coordinates for Brazilian Gold coffee and sketch in the demand curve which seems to be indicated by these data points. Note that we have not simply joined the observations with a jagged line but have, instead,

Table captionTABLE 5-2. Simulated Market Experiment for Brazilian Gold Coffee

Group

Price ($ per lb)

Quantity

Demanded

(lbs)

1

3.39

112

2

3.29

123

3

3.49

94

4

3.19

154

5

3.69

37

6

3.59

71

superimposed a “line of best fit.” In the next section we shall see how to calculate the exact line of best fit using regression analysis. In the present example, we have simply “eye-balled” the line of best fit across the data points. That is, we have sketched in the demand curve that seems visually appropriate to the points shown. We show it as a straight line for simplicity and because the data do not clearly indicate a nonlinear relationship between price and quantity demanded. The intercept of the line of best fit occurs at approximately $3.88 on the price axis, and the slope of the line is approximately 0.0045, which can be calculated by taking a particular vertical change (e.g., $3.88 to $3.38, or $0.50) and dividing this by the horizontal change indicated by the line of best fit (in this case, from zero to about 110 units). Thus, 0.50/110 = 0.0045.

Thus the simulated market experiment has generated data that allow the demand curve for Brazilian Gold coffee to be estimated as P = 3.88 — 0.0045(2, ceteris paribus. The firm can then determine the expression for its marginal revenue curve or calculate the price elasticity of demand at any price level. Note that for the price- elasticity calculation one would use the reciprocal of the slope term, namely 1/ —0.0045, or —222.22, as the term dQ x /dP x and read the coordinates P x and Q x

FIGURE 5-2. Estimated Demand Curve for Brazilian Gold Coffee

from the line of best fit. For example, the price elasticity at price $3.59 would be calculated as

e -

Note that we have used the estimated quantity demanded (at a price of $3.59) of 65 units, read from the demand curve (line of best fit) rather than from the observed quantity (in the experiment) of 71 units. We do this because we recognize that all the observations probably contain random errors and we expect (from consumer behavior theory) that the demand curve will be a smooth line between price-quantity combinations. The next time we set price at $3.59 the random disturbances may cause demand to be, for example, only 58 units. Our best estimate of demand at price $3.59 is given by the line of best fit, and therefore our best estimate of price elasticity at that price should be based on the estimated demand curve rather than on the actual data observed. 3

Direct Market Experiments

Direct market experiments involve real people in real market situations spending their own money on the goods and services they want. The firm will select one or more cities, regional markets, or states and conduct an experiment in these “test markets” that is designed to gauge consumer acceptance of the product and to identify the impact of a change in one or more of the controllable variables on the quantity demanded. In a regional market, for example, the firm might reduce the price of its product by 10 percent and compare the reaction of sales in that market over a particular period with previous sales in that market or with current sales in a similar but separate regional market. Alternatively, the firm may increase its advertising in a specific area or introduce a promotional gimmick or campaign in a particular market to judge the impact of that change before committing itself to the greater expense and risk of instituting this change on a nationwide basis.

Example: Many firms in the United States launch new products and conduct experimental promotional campaigns in regional test markets. San Diego, California, was used by the Miller Brewing Company as a test market for its new “Special Reserve”

- 222.22

-12.27

65

Note 3

beer during 1981, prior to the nationwide availability of that beer. Similarly, light (low calorie) wines were test marketed first in the San Diego area by Taylor California Cellars to evaluate market acceptance of the product and to judge consumer reaction to the price level and promotional campaign. San Diego is used as a test market because it is demographically representative of southern California. Similarly, Denver, Baltimore, Phoenix, and Providence were chosen as test markets by Miller for the new beer since they are representative of other areas of the United States in terms of demographics, income levels, lifestyles, and so forth. 4 Similarly, in the last quarter of 1985, Time, Inc., tested the market acceptability and price of a new weekly photo-news magazine called Picture Week in thirteen cities stretching from Portland, Oregon, to Portland, Maine. The new magazine was heavily promoted in those cities, and was priced at 95 cents in nine of those cities and 79 cents in the remaining four cities. 5

“Direct marketing,” probably better known by the obsolete term “mail-order marketing,” is a channel of communication between buyers and sellers that is ideally suited to market experiments. The consumer responds to an advertisement placed by the seller in any of several media, including newspapers, magazines, radio, television, and direct mailings to consumers. The mail and telephone orders that follow a firm’s advertisement or direct mailing represent cash up front and are much more reliable indicators of market demand than are simple statements of consumer intentions. By placing different advertisements and price offers in different regions or by making different offers to different samples within the same region (using direct mailings to randomly selected samples of the target market), the impact of different prices and promotional strategies on the entire target market can be reliably estimated. As a bonus, the feedback is usually fairly quick—responses to television advertisements requiring mail or telephone orders are concentrated within the next few days following the advertisement, while magazine advertisements and direct mailings generate the great majority of the total responses within six or eight weeks. Note, however, that direct marketing represents a different channel between the producer and the consumer and may appeal to a different type of consumer, with the result that the findings from experiments using direct marketing may not be generally applicable to other marketing channels, such as retailing through suburban and city stores. 6

Note: With any change in price or other marketing strategy there is likely to be an initial or “impact” effect followed by a gradual settling of the market into the new longer-term relationship between price (or other controllable variable) and the sales level. Consumers will eagerly try a new product or respond to a price reduction or a

Tame Jones, “San Diego’s Role as Test Market Toasted as Cap Comes off New Beer,” Los Angeles Times, August 17, 1981, pt. 2, pp. 1, 10; also Dan Berger, “Miller Introduces New Beer Here,” San Diego Union, August 13, 1981.

5 “Time Inc. Is Testing a Photo Magazine in 13 U.S. Markets,” Wall Street Journal, September 23 1985. F

6 See Bob Stone, Successful Direct Marketing Methods, 2nd ed. (Chicago: Crain Books, 1979).

promotional campaign, but having tried the product, many will go back to the rival product they were previously purchasing. Consumers may respond to a price reduction by purchasing several cartons of the product to build up their personal inventory of the product in the belief that the lower price is only temporary. The initial surge in consumer demand for a new product or for an established product at a lower price (or following a promotional campaign) may substantially overstate the sales gain the firm can expect in several weeks or months after consumers have finished making their adjustments in response to the change in prices, product availability, promotion, or in some other variable.

In order to observe more than the impact effects of a change, market experiments must be conducted over a reasonably prolonged period of time. During this period, however, one or more of the uncontrollable variables are likely to have changed, and thus the observed change in sales over the period will not be due simply to the change in the controllable variable. To separate the effects of changes in other variables the firm should also observe a “control market,” which should be chosen to exhibit a similar socioeconomic and cultural profile and be subject to the same climatic, political, and other uncontrollable events as is the “test market.” The change in sales in the control market over the period of the experiment will be solely the result of the uncontrollable factors. Assuming that the same change would have occurred in the test market, this magnitude is deducted from (if positive) or added to (if negative) the change in sales in the test market to find the change in sales caused by the manipulation of the controllable variable(s).

If an uncontrollable variable changes in the test market but remains constant or changes to a different degree in the control market, the results of the market experiment will be less reliable. Even when the control market is nearby, the climatic influence may vary, local politics may intervene, or some other event may cause an impact on the sales level. Competitors may react to the change in the test market by lowering prices or increasing promotional efforts, for example, while maintaining the status quo in the undisturbed control market. Under such circumstances the market experiment could prove to be worthless.

Thus direct market experiments must be implemented with caution; some luck must be forthcoming so that uncontrollable variables do not distort the results, which must be interpreted with care. If the pitfalls are largely anticipated and subsequently avoided, such experiments may provide information whose value (in terms of the present value of the additional future sales revenue) far exceeds its cost. We now turn to a means of estimating the demand coefficients from secondary data, in contrast to the previous reliance on primary data.

5.3 REGRESSION ANALYSIS OF CONSUMER DEMAND

■ Definition: Regression analysis is a statistical technique used to discover the apparent dependence of one variable upon one or more other variables. It is thus applicable to the problem of determining the coefficients of the demand function, since these

express the influence of the determining variables upon the demand for a product. For regression analysis we require a number of sets of observations, each consisting of the value of the dependent variable Y plus the corresponding values of the independent X variables. Regression analysis allows conclusions to be drawn from the pattern that emerges in the relationships between these pairs or sets of observations, and it can be applied to either time-series or cross-section data.

Time-Series versus Cross-Section Analysis

■ Definition: Time-series analysis uses observations that have been recorded over time in a particular situation. For example, monthly price and sales levels of a product in a particular firm may have been collected for the past six or twelve months. A problem with time-series analysis is that some of the uncontrollable factors that influence sales tend to change over time, and hence some of the differences in the sales observations will be the result of these influences rather than the result of any changes in the price level. If the changes in the uncontrollable variables are observable and measurable, we may include these variables as explanatory variables in the regression analysis. Actions of competitors and changing consumer income levels, for example, should be quantified (either directly or by use of a suitable proxy variable) and incorporated into the analysis.

Changing taste and preference patterns, on the other hand, are difficult to observe and measure, but they are likely to change over time. Using time as an explanatory variable in the regression analysis will pick up the influence of tastes and any other factors (not otherwise included in the analysis) that tend to change over the period.

■ Definition: Cross-section analysis uses observations from different firms in the same business environment at the same point or period of time. Hence cross-section analysis largely eliminates the problem of uncontrollable variables that change over time, but it introduces other factors that may differ between and among firms at a particular point of time. If factors such as the effectiveness of sales personnel, cash-flow position, level of promotional activity, and objectives of management differ among firms, they should be expected to have differing impacts on the sales level. Again, if these factors can be quantified and data obtained, they may be entered into the regression analysis to determine their impact upon the dependent variables.

Linearity of the Regression Equation

Having hypothesized that Y is a function of X or of several X variables and having collected data on the variables, we must then specify the form of the dependence of Y upon the X variables. Regression analysis requires that the dependence be expressed in the linear form

where the e term is added to represent the error or residual value that will arise as the difference between the actual value of each Y that has been observed in association with each set of X values, and the estimated value of each Y that the regression equation would predict for the X values given. For individual observations we should expect either a positive or a negative residual term, because of random variations in the value of Y. 1

Nonlinear relationships between the Y and X values, such as quadratic, cubic, exponential, hyperbolic, and power functions, may be used if these best fit the data, since these forms may be converted to linear form by mathematical transformation. The most commonly used nonlinear form is the power function, such as

Y = otX^x\ 2 (5-3)

where the independent variables, X\ and X 2 in this case, have a multiplicative (rather than additive) influence on the dependent variable Y. This curvilinear relationship can be expressed as a rectilinear relationship by logarithmic transformation. Taking logarithms of the values for Y, X i, and X 2 , we can express equation (5-3) as

log Y — log a + /3, log X\ + (3 2 log X 2 (5-4)

In this form, the equation is linear, and the coefficients |S, and (3 2 can be found directly using regression analysis. The coefficient a in equation (5-3) can be found by reversing the transformation (that is, taking the antilog) of the log a value provided by the regression analysis. We shall work through an example of this procedure in Chapter 8, in the context of cost forecasting.

Alternatively, you may feel that the appropriate functional form between the dependent variable and the independent variables is quadratic, like the form for the total revenue curve. A quadratic function can be expressed linearly as follows:

Y = ot + 0,*, + faX] (5-5)

Note that the last variable in this expression is the same independent variable (X,) squared. Similarly, if the appropriate functional form is thought to be cubic, as in the case of the production function or total cost function to be discussed in Chapter 6, we can postulate the relationship to be

Y = a + 0,*, + p 2 X 2 + p 3 X 2 (5-6)

and use regression analysis to determine the values of the a, (3 t , (3 2 , and 0 3 parameters.

7 For the accurate calculation of the coefficients in the regression equation, we require that the residuals occur randomly, be normally distributed, have constant variance, and have an expected value of zero. When the pattern of residuals does not conform to these restrictions, several problems arise, as we shall see later in this section.

Estimating the Regression Parameters

The “method of least squares” is used to find the a and /3 parameters such that the regression equation best represents or summarizes the apparent relationship between the Xi values and the dependent variable F. To illustrate this method, we shall proceed using a simple example of only one independent variable. (This is usually referred to as “simple regression” analysis, or “correlation” analysis rather than “multiple regression” analysis, which is used when we have two or more independent variables.)

Example: Suppose that we have collected ten pairs of observations on the variables Y and X —that is, the Y value and its associated X value on each of ten different occasions in a single situation (time-series data) or from ten different situations during the same period of time (cross-section data). These data points are shown as the asterisks in Figure 5-3. Observing these data points, we hypothesize a relationship of the form Y = a + (3X and use regression analysis to estimate the a and (3 parameters, which allow the line Y — a + (3X to best fit (or represent most accurately) the apparent relationship between the variables Y and X.

The method of least squares, often called ordinary least squares (OLS), is a mathematical process which chooses the intercept and slope of the line of best fit such that the sum of the squares of the deviations (or errors) is minimized. These deviations are shown in Figure 5-3 as the vertical distance between the line of best fit and the actual value of Y observed for a particular value of X. For example, given X\ in Figure 5-3, the estimated value of Fis F) (known as F] hat, where the hat [circumflex] over the Y\ indicates the estimated, or expected, value of Y u given AT,). The vertical difference between the observed F, and the estimated F, is known as the deviation, residual, or error term, and was connoted as e in equation (5-2).

FIGURE 5-3. The Line of Best Fit

Thus the regression equation specifies the line of best fit. It is selected by a mathematical procedure that positions the line such that the sum of the squared errors is minimized. We square the errors to avoid the positive deviations (points above the line of best fit) canceling out against negative deviations (points below the line of best fit), and to weight more heavily the larger deviations. 8 Note that the line of best fit passes through the point representing the mean values (F and X) of the variables. The regression equation “explains” part of the variation of each F observation from the F value, in terms of the difference between the associated X observation and the X value. In Figure 5-3, when X == X u the regression equation predicts F,, explaining the variation from F (that is, F — Y\), in terms of the variation in X (that is, X — X|). The "unexplained” residual, or error term, is the difference between the actual observation of F and the predicted value of F (that is, F, — F,).

Since computer programs (and preprogrammed or programmable hand calculators) for obtaining correlation and regression equations are becoming readily available and since the theory underlying regression analysis is typically covered in other courses, we shall not go too deeply into the theory or calculation of regression equations. It is instructive, however, to work through a simple two-variable case to demonstrate some of the issues and problems involved. Without proof, we state the following expressions for a and /3: 9

a = F - i 3X

(5-7)

and

riLXY - LXL Y

nLX 2 - (LX) 2 (5-8)

where F is the arithmetic mean of the F values; X is the arithmetic mean of the X values; E (sigma) connotes the sum of the term indicated (for example, I,XY is the sum of the products of X and F for all pairs of X and F observations); and n is the number of observations or data points.

Given a set of X and F observations, we can use these equations to solve for the line of best fit for the relationship that appears to exist between those two variables. Let us introduce a hypothetical example.

Example: Suppose a chain of department stores sells its own brand of frozen broccoli in each of its six stores. The chain is interested in knowing the price elasticity of

Note 4

Note 5

demand for this product. Its six stores are in similar middle-income suburban neighborhoods, and all are currently selling the item at $0.79 per package. Monthly sales at the six stores average 4,625 units per store, with no store’s sales being more than 150 units away from this level. Suppose the management decides to conduct an experiment: it will set the prices at different levels in each of the six stores to observe the reactions of sales to the different price levels. As a control it will maintain the price at $0.79 in the first store. The prices set for the other stores and the sales levels (in thousands) at each of the six stores over the one-month period of the experiment are shown in Table 5-3.

The table includes the calculations necessary for the solution of the a and /3 parameters. Using equation (5-8), we have

6(19.506) - 4.96(26.1)

13 6(4.5094) - (4.96) 2

-12.42 ~ 2.4548 = -5.0595

and from equation (5-7), we have

a = Y - 0X

= 4.35 - (-5.059X0.8267) = 8.5327

Table captionTABLE 5-3. Price/Sales Observations for Broccoli at Six Stores and the Calculations for Least-Square Analysis

Store

No.

Price

X

($)

Sales

Y

(000)

XY

A 2

Y 2

1

0.79

4.650

3.6735

0.6241

21.6225

2

0.99

3.020

2.9898

0.9801

9.1204

3

1.25

2.150

2.6875

1.5625

4.6225

4

0.89

4.400

3.9160

0.7921

19.3600

5

0.59

6.380

3.7642

0.3481

40.7044

6

0.45

5.500

2.4750

0.2025

30.2500

4.96

26.100

19.5060

4.5094

125.6798

(LA)

(L Y)

(DAT)

(LA 2 )

(L Y 2 )

A =

Table captionLY

LX

Table caption26.1

Table caption6

Table caption4.96

Table caption= 4.35

= 0.8267

n

6

Thus, Y = 8.5327 — 5.0595A" is the “line of best fit” to the data, when sales ( Y) are measured in thousands of units. As shown in Figure 5-4, the intercept of this line is thus 8,532.7 units on the Y axis, and the slope is —5,059.5 units of sales per dollar increase in price (which is to say, 50.595 units for each cent the price is increased). The intercept value should not be interpreted as the sales level that would be expected at the price of zero, since the range of price observations is from $0.45 to $1.25, and the values of a and j8 are estimated only for that range. Outside this range a different relationship may hold between X and Y. The intercept parameter serves only to locate the line of best fit such that it passes through the observations at the appropriate height. To interpret the intercept as the sales value when price is zero would be an example of the dangerous practice of extrapolation.

Note: The regression equation calculated above shows the dependence of quantity demanded on the price per unit. We can easily convert this into the form P — a + bQ traditionally used to represent the demand curve. Substituting for Q and P in the regression equation, we have

Q = 8.5327 - 5.0595P (5-9)

Subtracting Q from both sides, adding 5.0595 P to both sides, and dividing both sides by 5.0595, we have:

5.0595 P = 8.5327 - Q

P = 1.6865 - 0.19765 Q (5-10)

172 Demand Theory, Analysis, and Estimation

The marginal revenue curve is obtained from this estimate of the demand curve, based on our knowledge that it has the same vertical intercept and twice the slope. Thus,

MR = 1.6865 - 0.3953 Q (5-11)

Price elasticity of demand at any price level may be estimated using dQ/dP = — 5.0595 from the regression equation (5-9) and the estimated quantity demanded at that price level. For example, given P — 0.85, we find Q by substituting for P in equation (5-9) as follows:

<2 = 8.5327 - 5.0595 (0.85)

= 4.2321

Inserting these values into the point price elasticity expression, we find

dQ P e “ dP Q

= -5.0595 •

= 1.0162

0.85

4.2321

The price elasticity of demand at the price level of $0.85 is, therefore, fractionally above unity, indicating that total revenue would be virtually constant for either (very small) price increases or price reductions from the price of $0.85.


Chapter Notes