LINEST()

SyntaxLINEST(known_y’s,known_x’s,const,stats)

Definition. This function calculates the statistics for a line using the smallest squares to calculate a straight line appropriate for the data and then returns an array describing a line. Because this function returns an array of values, it must be entered as an array formula.

The equation for the line is:

y = mx + b

or, if there are multiple ranges of x-values:

y = m1x1 + m2x2 + ... + b

The dependent y-values are a function of the independent x-values. The m-values are coefficients corresponding to each x-value, and b is a constant value. Note that y, x, and m can be vectors. The array that the LINEST() function returns is {mn,mn-1,...,m1,b}. LINEST() can also return additional regression statistics (see Table 12-2 later in this section).

Arguments

Table 12-2 describes the additional regression statistics.

Table 12-2. Overview of the Regression Statistics

Statistic

Description

se1,se2,...,sen

The standard error values for the coefficients m1,m2,...,mn.

seb

The standard error value for constant b (seb = #N/A when const is FALSE).

r2

The coefficient of determination. Compares calculated and actual y-values and ranges in value from 0 to 1. If this coefficient is 1, there is a perfect correlation in the sample; there is no difference between the calculated y-value and the actual y-value. At the other extreme, if the coefficient of determination is 0, the regression equation is not helpful in predicting a y-value. You will find more information about the calculation of y2 in the description of RSQ().

sey

The standard error for the estimated y-value.

F

The f-statistic (or the calculated f-value). Use the f-statistic to determine whether the observed relation between the dependent and independent variables occurs by chance.

df

The degrees of freedom. Use the degrees of freedom to find critical f-values in a statistical table. Compare the values you find in the table to the f-statistic returned by LINEST() to determine a confidence level for the model.

ssreg

The regression sum of squares.

ssresid

The residual sum of squares.

Note

You can describe any straight line with the slope and the y-intercept:

You will find more information about slopes and intercepts in the descriptions of SLOPE() and INTERCEPT().

The equation for a straight line is y = mx + b. When you know the values of m and b, you can calculate any point on the line by inserting the y-value or x-value into the equation. You can also use the TREND() function.

Background. The correlation examines whether the variables are interrelated and expresses this interrelation with a single number. The regression analysis compares the dependent variables (for instance, the online orders in the software company example) with the independent variables (the website visits).

A regression analysis is used to explain a characteristic with one characteristic (dependent variable) with one or more other characteristics (independent variables). It is assumed that the dependent variable is increased or decreased according to the independent variable. Therefore, it determines a one-tailed dependency. You have to specify which of the two variables is dependent or independent.

If only one independent x-variable exists, you can calculate the slope and the y-intercept with the following formula:

The accuracy of a line calculated with LINEST() depends on the data variance. The more linear the data, the more accurate is the model calculated by LINEST(). LINEST() uses the smallest squares to calculate the best fit for the data. If only one independent x-variable exists, calculate m and b with the following formula:

image with no caption

The values x and y are the sample means: x = AVERAGE(known_x’s) and y = AVERAGE(known_y’s).

The regression functions LINEST() (linear regression) and LOGEST() (exponential regression) calculate the coefficient for the optimal line based on the given data. However, you have to decide which of the two results is a better fit for your data.

In a regression analysis, Excel calculates the square of the difference for each point between the y-value of a point and the actual y-value. The sum of the squared differences is called a residual sum of squares (ssresid). Then Excel calculates the total sum of the squares (sstotal). If const = TRUE or not specified, the total sum of the squares equals the sum of the squared differences between the actual y-values and the average of the y-values. If const = FALSE, the total sum of the squares equals the sum of the squares of the actual y-values.

The regression sum of squares (ssreg) is calculated as follows:

ssreg = sstotal – ssresid

The smaller the residual sum of squares is in relation to the total sum of the deviation squares, the greater is the value of the coefficient of determination (r2) that indicates how well the relation between the variables is defined by the equation resulting from the regression analysis.

image with no caption

Sometimes one or more x columns (assuming that the x-values and y-values are specified in columns) aren’t useful for calculating the estimated values together with the other x columns. In other words, if you exclude these x columns, the y-values are still accurate.

In this case, you should remove the redundant x columns from the regression model. This phenomenon is called collinearity because each redundant x column can be defined as the sum of the multiple of the nonredundant x columns. LINEST() checks for collinearity and excludes all redundant x columns from the regression model.

In the output of the LINEST() function, the coefficient and the se-values of an excluded x column equal 0. The exclusion of one or more columns impacts the df because the df depends on the number of x columns used to calculate the estimated values. If the value of df changes because redundant x columns are removed, the values of sey and f are also impacted. Collinearity does not occur very often.

If no x column is excluded, the df is calculated as follows: If k columns exist and const = TRUE or not specified:

df = nk – 1

If const = FALSE:

df = nk

In both cases, for each x column that is excluded, the df is increased by 1.

If you specify an array constant (such as values_x) as an argument, you have to divide the values in a row with colons and the rows with commas.

Tip

Note that if you use a regression equation, the predicted y-values may be invalid because they reside outside of the y-value range you used for the equation.

Example. A software company sells all its products through its website. The company regularly sends out newsletters to inform existing and potential customers about new and updated products and to draw attention to its website.

Last year, the orders through the website significantly increased. The management wants to know the reason. Is the increase in sales attributable to marketing or advertising? Did the increased website visits cause the increase in sales? This means that the company wants to know the correlation between the website visits and the online orders. The website manager has given you information about the users and their activities. The marketing department is asked to perform an analysis and creates an Excel list to compare the website visits and the online orders of the past 18 months. The marketing department performs a regression analysis by using the LINEST() function (see Figure 12-86).

Comparing the website visits with the online orders.

Figure 12-86. Comparing the website visits with the online orders.

Figure 12-87 shows how the LINEST() function returns the results.

Regression analysis using the LINEST() function.

Figure 12-87. Regression analysis using the LINEST() function.

Caution

Formulas that return an array as a result have to be entered as array formulas. Because LINEST() can calculate all 10 regression values simultaneously and returns an array, the formula or the arguments have to be entered in a certain way.

Select the cell range D39:E43 and then enter the arguments for LINEST() in cell D39. After you are finished, press Ctrl+Shift+Enter.

See Also

GROWTH(), LOGEST(), TREND()