5.5 Binary Regressors

SW 5.3

Let’s continue with the same model as above

\[ \mathbb{E}[Y|X_1,X_2,X_3] = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_3 \]

If \(X_1\) is discrete (let’s say binary): \[\begin{align*} \beta_1 = \mathbb{E}[Y|X_1=1,X_2,X_3] - \mathbb{E}[Y|X_1=0,X_2,X_3] \end{align*}\] \(\beta_1\) is still the partial effect of \(X_1\) on \(Y\) and should be interpreted as how much \(Y\) increases, on average, when \(X_1\) changes from 0 to 1, holding \(X_2\) and \(X_3\) constant.

If \(X_1\) can take more than just the values 0 and 1, but is still discrete (an example is a person’s years of education), then

\[ \beta_1 = \mathbb{E}[Y | X_1=x_1+1, X_2, X_3] - \mathbb{E}[Y|X_1=x_1, X_2, X_3] \] which holds for any possible value that \(X_1\) could take, so that \(\beta_1\) is the effect of a 1 unit increase in \(X_1\) on \(Y\), on average, holding constant \(X_2\) and \(X_3\).

Example 5.3 Suppose that you work for an airline and you are interested in predicting the number of passengers for a Saturday morning flight from Atlanta to Memphis. Let \(Y\) denote the number of passengers, \(X_1\) be equal to 1 for a morning flight and 0 otherwise, and let \(X_2\) be equal to 1 for a weekday flight and 0 otherwise. Further suppose that \(\mathbb{E}[Y|X_1,X_2] = 80 + 20 X_1 - 15 X_2\).

In this case, you would predict,

\[ 80 + 20 (1) - 15 (0) = 100 \] passengers on the flight.

In addition, the partial effect of being morning flight is equal to 20. This indicates that, on average, morning flights have 20 more passengers than non-morning flights holding whether or not the flight occurs on a weekday constant.

5.5.1 Computation

In order to include a binary or discrete covariate in a regression in R is straightforward. The following regression uses the mtcars data and adds a binary regressor, am, indicating whether or not a car has an automatic transmission.

reg2 <- lm(mpg ~ hp + wt + am, data=mtcars)
summary(reg2)
#> 
#> Call:
#> lm(formula = mpg ~ hp + wt + am, data = mtcars)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -3.4221 -1.7924 -0.3788  1.2249  5.5317 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 34.002875   2.642659  12.867 2.82e-13 ***
#> hp          -0.037479   0.009605  -3.902 0.000546 ***
#> wt          -2.878575   0.904971  -3.181 0.003574 ** 
#> am           2.083710   1.376420   1.514 0.141268    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.538 on 28 degrees of freedom
#> Multiple R-squared:  0.8399, Adjusted R-squared:  0.8227 
#> F-statistic: 48.96 on 3 and 28 DF,  p-value: 2.908e-11

In this example, cars that had an automatic transmission got about 2 more miles per gallon than cars that had an automatic transmission on average, holding horsepower and weight constant (though the p-value is only 0.14).