Before moving on, there are two additional linear model topics that we need to discuss. The first is the inclusion of a qualitative feature, and the second is an interaction term; both are explained in the following sections.

A qualitative feature, also referred to as a factor, can take on two or more levels such as Male/Female or Bad/Neutral/Good. If we have a feature with two levels, say gender, then we can create what is known as an indicator or dummy feature, arbitrarily assigning one level as 0 and the other as 1. If we create a model with just the indicator, our linear model would still follow the same formulation as before, that is, Y = B0 + B1x + e. If we code the feature as male is equal to zero and female is equal to one, then the expectation for male would just be the intercept, B0, while for female it would be B0 + B1x. In the situation where you have more than two levels of the feature, you can create n-1 indicators; so, for three levels you would have two indicators. If you created as many indicators as the levels, you would fall into the dummy variable trap, which results in perfect multicollinearity.

We can examine a simple example to learn how to interpret the output. Let's load the ISLR package and build a model with the Carseats dataset by using the following code snippet:

For this example, we will predict the sales of Carseats using just Advertising, a quantitative feature and the qualitative feature ShelveLoc, which is a factor of three levels: Bad, Good, and Medium. With factors, R will automatically code the indicators for the analysis. We build and analyze the model as follows:

If the shelving location is good, the estimate of sales is almost double than when the location is bad, given the intercept of 4.89662. To see how R codes the indicator features, you can use the contrasts() function as follows:

Interaction terms are similarly easy to code in R. Two features interact if the effect on the prediction of one feature depends on the value of the other feature. This would follow the formulation, Y = B0 + B1x + B2x + B1B2x + e. An example is available in the MASS package with the Boston dataset. The response is median home value, which is medv in the output; we will use two features, the percentage of homes with a low socioeconomic status, which is termed as lstat, and the age of the home in years, which is termed as age in the following output:

Using feature1*feature2 with the lm() function in the code, puts both the features as well as their interaction term in the model, as follows:

Examining the output, we can see that while the socioeconomic status is a highly predictive feature, the age of the home is not. However, the two features have a significant interaction to positively explain the home value.