4.5 Binary Regressors
SW 5.3
Let’s continue with the same model as above
\[ \mathbb{E}[Y|X_1,X_2,X_3] = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_3 \]
If \(X_1\) is discrete (let’s say binary): \[\begin{align*} \beta_1 = \mathbb{E}[Y|X_1=1,X_2,X_3] - \mathbb{E}[Y|X_1=0,X_2,X_3] \end{align*}\] \(\beta_1\) is still the partial effect of \(X_1\) on \(Y\) and should be interpreted as how much \(Y\) increases, on average, when \(X_1\) changes from 0 to 1, holding \(X_2\) and \(X_3\) constant.
If \(X_1\) can take more than just the values 0 and 1, but is still discrete (an example is a person’s years of education), then
\[ \beta_1 = \mathbb{E}[Y | X_1=x_1+1, X_2, X_3] - \mathbb{E}[Y|X_1=x_1, X_2, X_3] \] which holds for any possible value that \(X_1\) could take, so that \(\beta_1\) is the effect of a 1 unit increase in \(X_1\) on \(Y\), on average, holding constant \(X_2\) and \(X_3\).
Example 4.3 Suppose that you work for an airline and you are interested in predicting the number of passengers for a Saturday morning flight from Atlanta to Memphis. Let \(Y\) denote the number of passengers, \(X_1\) be equal to 1 for a morning flight and 0 otherwise, and let \(X_2\) be equal to 1 for a weekday flight and 0 otherwise. Further suppose that \(\mathbb{E}[Y|X_1,X_2] = 80 + 20 X_1 - 15 X_2\).
In this case, you would predict,
\[ 80 + 20 (1) - 15 (0) = 100 \] passengers on the flight.
In addition, the partial effect of being morning flight is equal to 20. This indicates that, on average, morning flights have 20 more passengers than non-morning flights holding whether or not the flight occurs on a weekday constant.
4.5.1 Computation
In order to include a binary or discrete covariate in a regression in R
is straightforward. The following regression uses the mtcars
data and adds a binary regressor, am
, indicating whether or not a car has an automatic transmission.
<- lm(mpg ~ hp + wt + am, data=mtcars)
reg2 summary(reg2)
#>
#> Call:
#> lm(formula = mpg ~ hp + wt + am, data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -3.4221 -1.7924 -0.3788 1.2249 5.5317
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 34.002875 2.642659 12.867 2.82e-13 ***
#> hp -0.037479 0.009605 -3.902 0.000546 ***
#> wt -2.878575 0.904971 -3.181 0.003574 **
#> am 2.083710 1.376420 1.514 0.141268
#> ---
#> Signif. codes:
#> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2.538 on 28 degrees of freedom
#> Multiple R-squared: 0.8399, Adjusted R-squared: 0.8227
#> F-statistic: 48.96 on 3 and 28 DF, p-value: 2.908e-11
In this example, cars that had an automatic transmission got about 2 more miles per gallon than cars that had an automatic transmission on average, holding horsepower and weight constant (though the p-value is only 0.14).