## 5.5 Binary Regressors

SW 5.3

Let’s continue with the same model as above

$\mathbb{E}[Y|X_1,X_2,X_3] = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_3$

If $$X_1$$ is discrete (let’s say binary): \begin{align*} \beta_1 = \mathbb{E}[Y|X_1=1,X_2,X_3] - \mathbb{E}[Y|X_1=0,X_2,X_3] \end{align*} $$\beta_1$$ is still the partial effect of $$X_1$$ on $$Y$$ and should be interpreted as how much $$Y$$ increases, on average, when $$X_1$$ changes from 0 to 1, holding $$X_2$$ and $$X_3$$ constant.

If $$X_1$$ can take more than just the values 0 and 1, but is still discrete (an example is a person’s years of education), then

$\beta_1 = \mathbb{E}[Y | X_1=x_1+1, X_2, X_3] - \mathbb{E}[Y|X_1=x_1, X_2, X_3]$ which holds for any possible value that $$X_1$$ could take, so that $$\beta_1$$ is the effect of a 1 unit increase in $$X_1$$ on $$Y$$, on average, holding constant $$X_2$$ and $$X_3$$.

Example 5.3 Suppose that you work for an airline and you are interested in predicting the number of passengers for a Saturday morning flight from Atlanta to Memphis. Let $$Y$$ denote the number of passengers, $$X_1$$ be equal to 1 for a morning flight and 0 otherwise, and let $$X_2$$ be equal to 1 for a weekday flight and 0 otherwise. Further suppose that $$\mathbb{E}[Y|X_1,X_2] = 80 + 20 X_1 - 15 X_2$$.

In this case, you would predict,

$80 + 20 (1) - 15 (0) = 100$ passengers on the flight.

In addition, the partial effect of being morning flight is equal to 20. This indicates that, on average, morning flights have 20 more passengers than non-morning flights holding whether or not the flight occurs on a weekday constant.

### 5.5.1 Computation

In order to include a binary or discrete covariate in a regression in R is straightforward. The following regression uses the mtcars data and adds a binary regressor, am, indicating whether or not a car has an automatic transmission.

reg2 <- lm(mpg ~ hp + wt + am, data=mtcars)
summary(reg2)
#>
#> Call:
#> lm(formula = mpg ~ hp + wt + am, data = mtcars)
#>
#> Residuals:
#>     Min      1Q  Median      3Q     Max
#> -3.4221 -1.7924 -0.3788  1.2249  5.5317
#>
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 34.002875   2.642659  12.867 2.82e-13 ***
#> hp          -0.037479   0.009605  -3.902 0.000546 ***
#> wt          -2.878575   0.904971  -3.181 0.003574 **
#> am           2.083710   1.376420   1.514 0.141268
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2.538 on 28 degrees of freedom
#> Multiple R-squared:  0.8399, Adjusted R-squared:  0.8227
#> F-statistic: 48.96 on 3 and 28 DF,  p-value: 2.908e-11

In this example, cars that had an automatic transmission got about 2 more miles per gallon than cars that had an automatic transmission on average, holding horsepower and weight constant (though the p-value is only 0.14).