8.9 Lab 7: Solutions

library(tidyr)
#> 
#> Attaching package: 'tidyr'
#> The following objects are masked from 'package:Matrix':
#> 
#>     expand, pack, unpack
library(plm)
#> 
#> Attaching package: 'plm'
#> The following objects are masked from 'package:dplyr':
#> 
#>     between, lag, lead

data(Fatalities, package="AER")

Fatalities$afatal_per_million <- 1000000 * (Fatalities$afatal / Fatalities$pop )

Fatalities88 <- subset(Fatalities, year==1988)

reg88 <- lm(afatal_per_million ~ jail, data=Fatalities88)
summary(reg88)
#> 
#> Call:
#> lm(formula = afatal_per_million ~ jail, data = Fatalities88)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -36.123 -16.622  -1.469   8.642 112.260 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   59.496      4.273  13.923   <2e-16 ***
#> jailyes        9.155      7.829   1.169    0.248    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 24.55 on 45 degrees of freedom
#>   (1 observation deleted due to missingness)
#> Multiple R-squared:  0.02949,    Adjusted R-squared:  0.007921 
#> F-statistic: 1.367 on 1 and 45 DF,  p-value: 0.2484

The estimated coefficient on mandatory jail laws is 9.155. We should interpret this as just the difference between alcohol related fatalities per million in states that had mandatory jail laws in 1988 relative to states that did not have them. We cannot reject that there is no difference between states where the policy is in place relative to those that do not have the policy.

reg88_covs <- lm(afatal_per_million ~ jail + unemp + beertax + baptist + dry + youngdrivers + miles, data=Fatalities88)
summary(reg88_covs)
#> 
#> Call:
#> lm(formula = afatal_per_million ~ jail + unemp + beertax + baptist + 
#>     dry + youngdrivers + miles, data = Fatalities88)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -39.065  -9.907  -1.690   9.673  82.100 
#> 
#> Coefficients:
#>                Estimate Std. Error t value Pr(>|t|)  
#> (Intercept)  -29.373536  32.500240  -0.904   0.3717  
#> jailyes        3.120574   6.849271   0.456   0.6512  
#> unemp          4.815081   1.892369   2.544   0.0150 *
#> beertax        2.311850   9.521684   0.243   0.8094  
#> baptist        0.661694   0.527228   1.255   0.2169  
#> dry           -0.026675   0.383956  -0.069   0.9450  
#> youngdrivers  -0.092100 142.804244  -0.001   0.9995  
#> miles          0.006802   0.002822   2.411   0.0207 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 19.41 on 39 degrees of freedom
#>   (1 observation deleted due to missingness)
#> Multiple R-squared:  0.4742, Adjusted R-squared:  0.3798 
#> F-statistic: 5.024 on 7 and 39 DF,  p-value: 0.0003999

The estimated coefficient on jail is 3.12. It is somewhat smaller than the previous estimate, though neither is statistically significant. We should interpret this as the partial effect of the mandatory jail policy, that is, that we estimate that mandatory jail laws increase the number of alcohol related fatalities per million by 3.12 on average controlling for the unemployment rate, beer tax, the fraction of southern baptists in the state, the fraction of residents in dry counties, the fraction of young drivers, and the average miles driven in the state. We cannot reject that the partial effect of mandatory jail policies is equal to 0.

fd_reg <- plm(afatal_per_million ~ jail + as.factor(year),
              effect="individual",
              index="state", model="fd",
              data=Fatalities)
summary(fd_reg)
#> Oneway (individual) effect First-Difference Model
#> 
#> Call:
#> plm(formula = afatal_per_million ~ jail + as.factor(year), data = Fatalities, 
#>     effect = "individual", model = "fd", index = "state")
#> 
#> Unbalanced Panel: n = 48, T = 6-7, N = 335
#> Observations used in estimation: 287
#> 
#> Residuals:
#>      Min.   1st Qu.    Median   3rd Qu.      Max. 
#> -51.66677  -5.09887   0.23801   6.28688 119.08976 
#> 
#> Coefficients: (1 dropped because of singularities)
#>                     Estimate Std. Error t-value Pr(>|t|)   
#> (Intercept)         -2.15376    0.80673 -2.6697 0.008035 **
#> jailyes              2.60763    5.28351  0.4935 0.622016   
#> as.factor(year)1983 -5.28423    1.82330 -2.8982 0.004050 **
#> as.factor(year)1984 -3.58247    2.29451 -1.5613 0.119577   
#> as.factor(year)1985 -5.60800    2.43517 -2.3029 0.022017 * 
#> as.factor(year)1986 -0.74192    2.28988 -0.3240 0.746180   
#> as.factor(year)1987 -2.16244    1.80716 -1.1966 0.232476   
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Total Sum of Squares:    54692
#> Residual Sum of Squares: 51620
#> R-Squared:      0.056171
#> Adj. R-Squared: 0.035946
#> F-statistic: 2.77733 on 6 and 280 DF, p-value: 0.012223

within_reg <- plm(afatal_per_million ~ jail + as.factor(year),
              effect="individual",
              index="state", model="within",
              data=Fatalities)
summary(within_reg)
#> Oneway (individual) effect Within Model
#> 
#> Call:
#> plm(formula = afatal_per_million ~ jail + as.factor(year), data = Fatalities, 
#>     effect = "individual", model = "within", index = "state")
#> 
#> Unbalanced Panel: n = 48, T = 6-7, N = 335
#> 
#> Residuals:
#>        Min.     1st Qu.      Median     3rd Qu.        Max. 
#> -95.1937300  -4.9678238   0.0088078   5.1611249  40.6263546 
#> 
#> Coefficients:
#>                     Estimate Std. Error t-value  Pr(>|t|)    
#> jailyes               8.3327     4.9666  1.6777 0.0945164 .  
#> as.factor(year)1983  -7.9151     2.6936 -2.9384 0.0035734 ** 
#> as.factor(year)1984  -8.4863     2.7115 -3.1298 0.0019341 ** 
#> as.factor(year)1985 -12.7849     2.7331 -4.6778 4.518e-06 ***
#> as.factor(year)1986 -10.0726     2.7331 -3.6854 0.0002741 ***
#> as.factor(year)1987 -13.5276     2.7115 -4.9890 1.067e-06 ***
#> as.factor(year)1988 -13.6296     2.7279 -4.9964 1.030e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Total Sum of Squares:    53854
#> Residual Sum of Squares: 47607
#> R-Squared:      0.116
#> Adj. R-Squared: -0.054487
#> F-statistic: 5.24882 on 7 and 280 DF, p-value: 1.2051e-05

The estimated coefficient on jail has the same interpretation as in the previous problem. The estimated effect here is marginally statistically significant. 6.

within_reg_covs <- plm(afatal_per_million ~ jail + unemp + beertax + baptist + dry + youngdrivers + miles,
                       effect="individual",
                       index="state", model="within",
                       data=Fatalities)
summary(within_reg_covs)
#> Oneway (individual) effect Within Model
#> 
#> Call:
#> plm(formula = afatal_per_million ~ jail + unemp + beertax + baptist + 
#>     dry + youngdrivers + miles, data = Fatalities, effect = "individual", 
#>     model = "within", index = "state")
#> 
#> Unbalanced Panel: n = 48, T = 6-7, N = 335
#> 
#> Residuals:
#>      Min.   1st Qu.    Median   3rd Qu.      Max. 
#> -95.62306  -5.69773  -0.56903   4.79219  47.80871 
#> 
#> Coefficients:
#>                 Estimate  Std. Error t-value  Pr(>|t|)    
#> jailyes       4.9731e+00  4.9613e+00  1.0024   0.31702    
#> unemp        -1.1340e+00  5.6592e-01 -2.0038   0.04605 *  
#> beertax      -2.7456e+01  1.5080e+01 -1.8207   0.06972 .  
#> baptist       2.5083e+00  4.3324e+00  0.5790   0.56308    
#> dry           4.3092e-01  1.0870e+00  0.3964   0.69208    
#> youngdrivers  2.6357e+02  5.0169e+01  5.2537 2.957e-07 ***
#> miles        -6.8899e-04  7.3182e-04 -0.9415   0.34727    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Total Sum of Squares:    53854
#> Residual Sum of Squares: 48083
#> R-Squared:      0.10717
#> Adj. R-Squared: -0.065023
#> F-statistic: 4.80119 on 7 and 280 DF, p-value: 4.0281e-05

We should interpret the estimated coefficient on jail as an estimate of how much alcohol related traffic fatalities per million change on average under mandatory jail policies after controlling for the unemployment rate, beer taxes, the fraction of the state that is southern baptist, the fraction of the state that lives in a dry county, the fraction of young drivers in a state, and the average number of miles driven per person in the stata, and accounting for time invariant variables whose effects do not change over time.

# part a: convert data to two period panel data
two_period <- subset(Fatalities, year==1982 | year==1988)
# and drop some missing
two_period <- subset(two_period, !is.na(jail))
two_period <- BMisc::makeBalancedPanel(two_period, "state", "year")
two_period$jail <- 1*(two_period$jail=="yes")

# part b: convert into wide format
wide_df <- pivot_wider(two_period, 
                       id_cols="state", 
                       names_from="year",
                       values_from=c("jail", "afatal_per_million"))

# add back other covariates from 1982
wide_df <- merge(wide_df, subset(Fatalities, year==1982)[,c("unemp", "beertax", "baptist", "dry", "youngdrivers", "miles","state")], by="state")

# change in fatal accidents over time
wide_df$Dafatal_per_million <- wide_df$afatal_per_million_1988 - wide_df$afatal_per_million_1982

# part c: drop already treated states
wide_df <- subset(wide_df, jail_1982==0)

did <- lm(Dafatal_per_million ~ jail_1988, data=wide_df)
summary(did)
#> 
#> Call:
#> lm(formula = Dafatal_per_million ~ jail_1988, data = wide_df)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -55.652 -10.993   5.033  10.405  76.822 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)   
#> (Intercept)  -12.585      4.242  -2.966  0.00532 **
#> jail_1988      5.102     11.695   0.436  0.66526   
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 24.37 on 36 degrees of freedom
#> Multiple R-squared:  0.005259,   Adjusted R-squared:  -0.02237 
#> F-statistic: 0.1903 on 1 and 36 DF,  p-value: 0.6653
did_covs <- lm(Dafatal_per_million ~ jail_1988 + unemp + beertax + baptist + dry + youngdrivers + miles, data=wide_df)
summary(did_covs)
#> 
#> Call:
#> lm(formula = Dafatal_per_million ~ jail_1988 + unemp + beertax + 
#>     baptist + dry + youngdrivers + miles, data = wide_df)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -38.346 -12.383   1.456   9.092  60.585 
#> 
#> Coefficients:
#>                Estimate Std. Error t value Pr(>|t|)  
#> (Intercept)    6.851636  50.643035   0.135   0.8933  
#> jail_1988     -1.853041  10.834391  -0.171   0.8653  
#> unemp          3.725007   1.919862   1.940   0.0618 .
#> beertax        8.300778  10.052007   0.826   0.4154  
#> baptist        0.527893   0.723263   0.730   0.4711  
#> dry           -0.955636   0.546475  -1.749   0.0906 .
#> youngdrivers  89.432017 234.379768   0.382   0.7055  
#> miles         -0.010360   0.005823  -1.779   0.0854 .
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 21.9 on 30 degrees of freedom
#> Multiple R-squared:  0.3303, Adjusted R-squared:  0.174 
#> F-statistic: 2.114 on 7 and 30 DF,  p-value: 0.07276

If we are willing to believe that, in the absence of the policy, that trends in alcohol related fatalities per million people would have followed the same trends over time for treated and untreated states, then we can interpret these as causal effects. These estimates are broadly similar to the previous ones though the second ones (that include additional covariates) are about the only ones where we ever get a negative estimate for the effect of mandatory jail policies. Like the previous estimates, neither of these estimates are statistically different from 0.

lag_reg <- lm(afatal_per_million_1988 ~ jail_1988 + afatal_per_million_1982, data=wide_df)
summary(lag_reg)
#> 
#> Call:
#> lm(formula = afatal_per_million_1988 ~ jail_1988 + afatal_per_million_1982, 
#>     data = wide_df)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -29.120 -12.663  -0.684   6.873  92.390 
#> 
#> Coefficients:
#>                         Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)              19.0810    10.1089   1.888 0.067401 .  
#> jail_1988                 3.4323    10.3171   0.333 0.741363    
#> afatal_per_million_1982   0.5607     0.1303   4.303 0.000129 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 21.47 on 35 degrees of freedom
#> Multiple R-squared:  0.3462, Adjusted R-squared:  0.3088 
#> F-statistic: 9.266 on 2 and 35 DF,  p-value: 0.0005896
lag_reg_covs <- lm(afatal_per_million_1988 ~ jail_1988 + afatal_per_million_1982 + unemp + beertax + baptist + dry + youngdrivers + miles, data=wide_df)
summary(lag_reg_covs)
#> 
#> Call:
#> lm(formula = afatal_per_million_1988 ~ jail_1988 + afatal_per_million_1982 + 
#>     unemp + beertax + baptist + dry + youngdrivers + miles, data = wide_df)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -27.840  -8.793  -1.364   5.146  71.409 
#> 
#> Coefficients:
#>                           Estimate Std. Error t value Pr(>|t|)   
#> (Intercept)             -17.292595  46.318477  -0.373  0.71161   
#> jail_1988                 0.817453   9.786782   0.084  0.93401   
#> afatal_per_million_1982   0.505371   0.173718   2.909  0.00689 **
#> unemp                     3.189918   1.736443   1.837  0.07647 . 
#> beertax                   2.885796   9.236174   0.312  0.75694   
#> baptist                   0.965785   0.668259   1.445  0.15911   
#> dry                      -0.567567   0.509915  -1.113  0.27482   
#> youngdrivers            120.615853 211.026841   0.572  0.57202   
#> miles                    -0.002692   0.005888  -0.457  0.65087   
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 19.7 on 29 degrees of freedom
#> Multiple R-squared:  0.5443, Adjusted R-squared:  0.4186 
#> F-statistic: 4.329 on 8 and 29 DF,  p-value: 0.001596

These estimates directly control for alcohol related fatalities per million in the pre-treatment period 1982. These sorts of specifications are less common in economics, but, in my view, it seems like a reasonable approach here. That said, the results are more or less the same as earlier estimates.

We don’t have very strong evidence that mandatory jail policies reduced the number traffic fatalities. In my view, probably the best specifications for trying to understand the causal effects are the ones in part 7 (particularly, the ones that include covariates there), but I think that the the results in parts 4-9 are also informative. Broadly, these estimates are more or less similar — none of them are statistically significant and most are positive (which is an unexpected sign).

Before we finish, let me mention a few caveats to these results:

First, I would be very hesitant to interpret these results as definitively saying that mandatory jail policies have no effect on alcohol related traffic fatalities. The main reason to be clear about this is that our standard error are quite large. For example, in the second specification in part 7 (the one I like the most), a 95% confidence interval for our estimate is \([-23.1, 19.4]\). This is a wide confidence interval — the average number of alcohol related traffic fatalities per million across all states and time periods is only 66. So our estimates are basically still compatible with very large reductions in alcohol related traffic fatalities up to large increases in alcohol related traffic fatalities.
Let me make one more comment about the sign of our results. Many of our point estimates are positive; as we discussed earlier, it is hard to rationalize harsher punishments increasing alcohol related traffic fatalities. I think the main explanation for these results is just that our estimates are pretty noisy and, therefore, more or less “by chance” we are getting estimates that have an unexpected sign. But there are some other possible explanations that are worth mentioning. For one, there are a number of other policies related to drunk driving that occurred in the 1980s (particularly, related to legal drinking age) but perhaps others. It is not clear how these would interact with our estimates, but they could certainly play some role. Besides that, it seems to me that we have a pretty good set of covariates that enter our models, but there could be important covariates that we are missing. For this reason, some expertise in how to model state-level traffic fatalities is actually a very important skill here (actually probably the key skill here!)