5.7 Interpreting Interaction Terms

SW 8.3

Another way to allow for partial effects that vary across different values of the regressors is to include interaction terms.

Consider the following regression model

\[ \mathbb{E}[Y|X_1,X_2,X_3] = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_1 X_2 + \beta_4 X_3 \]

The term \(X_1 X_2\) is called the interaction term. In this model, the partial effect of \(X_1\) is given by

\[ \frac{\partial \, \mathbb{E}[Y|X_1,X_2,X_3]}{\partial \, X_1} = \beta_1 + \beta_3 X_2 \]

In this model, the effect of \(X_1\) varies with \(X_2\). As in the previous section, you could report the partial effect for different values of \(X_2\) or consider \(APE = \beta_1 + \beta_3 \mathbb{E}[X_2]\).

There are a couple of other things worth pointing out for interaction terms

It is very common for one of the interaction terms, say, \(X_2\) to be a binary variable. This gives a way to easily test if the effect of \(X_1\) is the same across the two “groups” defined by \(X_2\). For example, suppose you wanted to check if the partial effect of education was the same for men and women. You could run a regression like

\[ Wage = \beta_0 + \beta_1 Education + \beta_2 Female + \beta_3 Education \cdot Female + U \]

From the previous discussion, the partial effect of education is given by

\[ \beta_1 + \beta_3 Female \]

Thus, the partial effect education for men is given by \(\beta_1\), and the partial effect of education for women is given by \(\beta_1 + \beta_3\). Thus, if you want to test if the partial effect of education differs for men and women, you can just test if \(\beta_3=0\). If \(\beta_3>0\), it suggests a higher partial effect of education for women, and if \(\beta_3 < 0\), it suggests a lower partial effect of education for women.
Another interesting case is when \(X_1\) and \(X_2\) are both binary. In this case, a model that includes an interaction term is called a saturated model. It is called this because it is actually nonparametric. In particular, notice that in the model \(\mathbb{E}[Y|X_1,X_2] = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_1 X_2\),

\[ \begin{aligned} \mathbb{E}[Y|X_1=0,X_2=0] &= \beta_0 \\ \mathbb{E}[Y|X_1=1,X_2=0] &= \beta_0 + \beta_1 \\ \mathbb{E}[Y|X_1=0,X_2=1] &= \beta_0 + \beta_2 \\ \mathbb{E}[Y|X_1=1,X_2=1] &= \beta_0 + \beta_1 + \beta_2 + \beta_3 \end{aligned} \]

This exhausts all possible combinations of the regressors and means that you can recover each possible value of the conditional expectation from the parameters of the model.

It would be possible to write down a saturated model in cases with more than two binary regressors (or even discrete regressors) — you would just need to include more interaction terms. The key thing is that there be no continuous regressors. That said, as you start to add more and more discrete regressors and their interactions, you will effectively start to run into the curse of dimensionality issues that we discussed earlier.

As an example, consider our earlier example of flights from Atlanta to Memphis where \(Y\) denoted the number of passengers, \(X_1\) was equal to 1 for a a morning flight and 0 otherwise, and \(X_2\) was equal to one for a weekday flight and 0 otherwise. Suppose that \(\mathbb{E}[Y|X_1,X_2] = 90 - 15 X_1 - 5 X_2 + 25 X_1 X_2\). Then,

\[ \begin{aligned} \mathbb{E}[Y|X_1=0,X_2=0] &= 90 \quad & \textrm{non-morning, weekend} \\ \mathbb{E}[Y|X_1=1,X_2=0] &= 90 - 15 = 75 \quad & \textrm{morning, weekend} \\ \mathbb{E}[Y|X_1=0,X_2=1] &= 90 - 5 = 85 \quad & \textrm{non-morning, weekday} \\ \mathbb{E}[Y|X_1=1,X_2=1] &= 90 - 15 - 5 + 25 = 100 \quad & \textrm{morning, weekend} \end{aligned} \]

5.7.1 Computation

Including interaction terms in regressions in R is straightforward. Using the mtcars data, we can do it as follows

reg4 <- lm(mpg ~ hp + wt + am + hp*am, data=mtcars)
summary(reg4)
#> 
#> Call:
#> lm(formula = mpg ~ hp + wt + am + hp * am, data = mtcars)
#> 
#> Residuals:
#>    Min     1Q Median     3Q    Max 
#> -3.435 -1.510 -0.697  1.284  5.245 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 33.34196    2.79711  11.920 2.89e-12 ***
#> hp          -0.02918    0.01449  -2.014  0.05407 .  
#> wt          -3.05617    0.94036  -3.250  0.00309 ** 
#> am           3.55141    2.35742   1.506  0.14355    
#> hp:am       -0.01129    0.01466  -0.770  0.44809    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.556 on 27 degrees of freedom
#> Multiple R-squared:  0.8433, Adjusted R-squared:  0.8201 
#> F-statistic: 36.33 on 4 and 27 DF,  p-value: 1.68e-10

The interaction term in the results is in the row that starts with hp:am. These estimates suggest that, while horsepower does seem to decrease miles per gallon controlling for weight and whether or not the car has an automatic transmission, the effect of horsepower does not seem to vary much by whether or not the car has an automatic transmission (at least not in a big enough way that we can detect it with the data that we have).