LINEST()

Syntax. LINEST(known_y’s,known_x’s,const,stats)

Definition. This function calculates the statistics for a line using the smallest squares to calculate a straight line appropriate for the data and then returns an array describing a line. Because this function returns an array of values, it must be entered as an array formula.

The equation for the line is:

y = mx + b

or, if there are multiple ranges of x-values:

y = m1x1 + m2x2 + ... + b

The dependent y-values are a function of the independent x-values. The m-values are coefficients corresponding to each x-value, and b is a constant value. Note that y, x, and m can be vectors. The array that the LINEST() function returns is {mn,mn-1,...,m1,b}. LINEST() can also return additional regression statistics (see Table 12-2 later in this section).

Arguments

known_y’s (required). The y-values you already know from the relationship y = mx + b
- If the known_y’s array consists of a single column, then each column of the known_x’s array is interpreted as a separate variable.
- If the known_y’s array consists of a single row, then each row of the known_x’s array is interpreted as a separate variable.
known_x’s (optional). The x-values you already know from the relationship y = mx + b
- The known_x’s array can include one or more sets of variables. If only one variable is used, known_y’s and known_x’s can be ranges of any shape, as long as they have equal dimensions. If more than one variable is used, known_y’s must be a vector (a range consisting of a single row or column).
- If known_x’s is not specified, it is assumed to be the array {1,2,3,...} containing the same number of elements as known_y’s.
const (optional). A logical value specifying whether to force the constant b to equal 0
- If const is TRUE or omitted, b is calculated normally.
- If const is FALSE, then b is set equal to 0 and the m-values are adjusted to y = m.
stats (optional). A logical value specifying whether to return additional regression statistics
- If stats is TRUE, the LINEST() function returns the additional regression statistics and the returned array is: {mn;mn-1;...;m1;b.sen;sen-1;...;se1;seb.r2; sey.F;df.ssreg;ssresid}.
- If stats is FALSE or not specified, the LINEST() function returns only the m-coefficients and the constant b.

Table 12-2 describes the additional regression statistics.

Table 12-2. Overview of the Regression Statistics

Statistic	Description
se1,se2,...,sen	The standard error values for the coefficients m1,m2,...,mn.
seb	The standard error value for constant b (se_b = #N/A when const is FALSE).
r2	The coefficient of determination. Compares calculated and actual y-values and ranges in value from 0 to 1. If this coefficient is 1, there is a perfect correlation in the sample; there is no difference between the calculated y-value and the actual y-value. At the other extreme, if the coefficient of determination is 0, the regression equation is not helpful in predicting a y-value. You will find more information about the calculation of y² in the description of RSQ().
sey	The standard error for the estimated y-value.
F	The f-statistic (or the calculated f-value). Use the f-statistic to determine whether the observed relation between the dependent and independent variables occurs by chance.
df	The degrees of freedom. Use the degrees of freedom to find critical f-values in a statistical table. Compare the values you find in the table to the f-statistic returned by LINEST() to determine a confidence level for the model.
ssreg	The regression sum of squares.
ssresid	The residual sum of squares.

Note

You can describe any straight line with the slope and the y-intercept:

Slope (m): To find the slope of a line (often written as m), take two points on the line, (x1,y1) and (x2,y2); the slope is equal to (y2 – y1)/(x2 – x1).
y-intercept (b): The y-intercept of a line (often written as b) is the value of y at the point where the line crosses the y-axis.

You will find more information about slopes and intercepts in the descriptions of SLOPE() and INTERCEPT().

The equation for a straight line is y = mx + b. When you know the values of m and b, you can calculate any point on the line by inserting the y-value or x-value into the equation. You can also use the TREND() function.

Background. The correlation examines whether the variables are interrelated and expresses this interrelation with a single number. The regression analysis compares the dependent variables (for instance, the online orders in the software company example) with the independent variables (the website visits).

A regression analysis is used to explain a characteristic with one characteristic (dependent variable) with one or more other characteristics (independent variables). It is assumed that the dependent variable is increased or decreased according to the independent variable. Therefore, it determines a one-tailed dependency. You have to specify which of the two variables is dependent or independent.

If only one independent x-variable exists, you can calculate the slope and the y-intercept with the following formula:

Slope: =INDEX(LINEST(known_y’s;known_x’s);1)
y intercept: =INDEX(LINEST(known_y’s;known_x’s);2)

The accuracy of a line calculated with LINEST() depends on the data variance. The more linear the data, the more accurate is the model calculated by LINEST(). LINEST() uses the smallest squares to calculate the best fit for the data. If only one independent x-variable exists, calculate m and b with the following formula:

The values x and y are the sample means: x = AVERAGE(known_x’s) and y = AVERAGE(known_y’s).

The regression functions LINEST() (linear regression) and LOGEST() (exponential regression) calculate the coefficient for the optimal line based on the given data. However, you have to decide which of the two results is a better fit for your data.

In a regression analysis, Excel calculates the square of the difference for each point between the y-value of a point and the actual y-value. The sum of the squared differences is called a residual sum of squares (ssresid). Then Excel calculates the total sum of the squares (sstotal). If const = TRUE or not specified, the total sum of the squares equals the sum of the squared differences between the actual y-values and the average of the y-values. If const = FALSE, the total sum of the squares equals the sum of the squares of the actual y-values.

The regression sum of squares (ssreg) is calculated as follows:

ssreg = sstotal – ssresid

The smaller the residual sum of squares is in relation to the total sum of the deviation squares, the greater is the value of the coefficient of determination (r²) that indicates how well the relation between the variables is defined by the equation resulting from the regression analysis.

Sometimes one or more x columns (assuming that the x-values and y-values are specified in columns) aren’t useful for calculating the estimated values together with the other x columns. In other words, if you exclude these x columns, the y-values are still accurate.

In this case, you should remove the redundant x columns from the regression model. This phenomenon is called collinearity because each redundant x column can be defined as the sum of the multiple of the nonredundant x columns. LINEST() checks for collinearity and excludes all redundant x columns from the regression model.

In the output of the LINEST() function, the coefficient and the se-values of an excluded x column equal 0. The exclusion of one or more columns impacts the df because the df depends on the number of x columns used to calculate the estimated values. If the value of df changes because redundant x columns are removed, the values of sey and f are also impacted. Collinearity does not occur very often.

If no x column is excluded, the df is calculated as follows: If k columns exist and const = TRUE or not specified:

df = n – k – 1

If const = FALSE:

df = n – k

In both cases, for each x column that is excluded, the df is increased by 1.

If you specify an array constant (such as values_x) as an argument, you have to divide the values in a row with colons and the rows with commas.

Tip

Note that if you use a regression equation, the predicted y-values may be invalid because they reside outside of the y-value range you used for the equation.

Example. A software company sells all its products through its website. The company regularly sends out newsletters to inform existing and potential customers about new and updated products and to draw attention to its website.

Last year, the orders through the website significantly increased. The management wants to know the reason. Is the increase in sales attributable to marketing or advertising? Did the increased website visits cause the increase in sales? This means that the company wants to know the correlation between the website visits and the online orders. The website manager has given you information about the users and their activities. The marketing department is asked to perform an analysis and creates an Excel list to compare the website visits and the online orders of the past 18 months. The marketing department performs a regression analysis by using the LINEST() function (see Figure 12-86).

Figure 12-86. Comparing the website visits with the online orders.

Figure 12-87 shows how the LINEST() function returns the results.

Figure 12-87. Regression analysis using the LINEST() function.

Caution

Formulas that return an array as a result have to be entered as array formulas. Because LINEST() can calculate all 10 regression values simultaneously and returns an array, the formula or the arguments have to be entered in a certain way.

Select the cell range D39:E43 and then enter the arguments for LINEST() in cell D39. After you are finished, press Ctrl+Shift+Enter.

LINEST()

Note

Tip

Caution

See Also