For this lab, we will use the
Fatalities data. We will study the causal effect of mandatory jail sentence policies for drunk driving on traffic fatalities. The
Fatalities data consists of panel data of traffic fatality death rates, whether or not a state has a mandatory jail sentence policy or not as well as several other variables from 1982-1988. Economic theory suggests that raising the cost of some behavior (in this case, you can think of a mandatory jail sentence as raising the cost of drunk driving) will lead to less of that behavior. That being said, it’s both interesting to test this theory and also consider the magnitude of this effect. That’s what we’ll do in this problem.
This data comes in a somewhat messier format than some of the data that we have used previously. To start with, create a new column in the data called
afatal_per_millionthat is the number of alcohol involved vehicle fatalities per millions people in a state in a particular year. The variable
afatalcontains the total number of alcohol involved vehicle fatalities, and the variable
popcontains the total population in a state.
Using a subset of the data from 1988, run a regression of
afatal_per_millionon whether or not a state has a mandatory jail sentence policy
jail. How do you interpret the results?
Using the same subset from part 2, run a regression of
jail, unemployment rate (
unemp), the tax on a case of beer (
beertax), the percentage of southern baptists in the state (
baptist), the percentage of residents residing in dry counties (
dry), the percentage of young drivers in the state, (
youngdrivers), and the average miles driven per person in a state (
miles). How do you interpret the estimated coefficient on
jail? Would you consider this to be a reasonable estimate of the (average) causal effect of mandatory jail policies on alcohol related fatalities?
Now, using the full data, let’s estimate a fixed effects model with alcohol related fatalities per million as the outcome and mandatory jail policies as a regressor. Estimate the model using first differences and make sure to include time fixed effects. How do you interpret the results?
Estimate the same model as in part 4, but using the within estimator instead of first differences. Compare these results to the ones from part 4.
Using the same within estimator as in part 5, include the same set of covariates from part 3 and interpret the estimated effect of mandatory jail policies. How do these estimates compare to the earlier ones?
Now, we’ll switch to using a difference in differences approach to estimating the effect of mandatory jail policies. First, we’ll manipulate the data some.
To keep things simple, let’s start by limiting the data to the years 1982 and 1988 and drop the in-between periods.
Second, let’s calculate the change in alcohol related fatalities per million between 1982 and 1998 and keep the covariates that we have been using from 1982. One way to do this, is to use the
pivot_widerfunction from the
tidyr. In the case of panel data, “long format” data means that each row in the data corresponds to a paricular observation and a particular time period. Thus, with long format data, there are \(n \times T\) total rows in the data. On the other hand, “wide format” data means that each row holds all the data (across all time periods) for a particular observation. Converting back and forth between long and wide formats is a common data manipulation task. Hint: This step is probably unfamiliar, so I’d recommend seeing if you can use
?tidyr::pivot_widerto see if you can figure out how to complete this step, but, if not, you can copy this code from the solutions in the next section.
Finally, drop all states that are already treated in 1982.
Using the data that you constructed in part 7, implement the difference in differences regression of the change in alcohol related fatalities per million from 1982 to 1988 on the mandatory jail policy. How do you interpret these results and how do they compare to the previous ones? Now, additionally include the set of covariates that we have been using in this model. How do you interpret these results and how do they compare to the previous ones?
An alternative to DID, is to include the lagged outcome as a covariate. Using the data constructed in part 7, run a regression of alcohol related fatalities per million in 1988 on the mandatory jail policy and alcohol related fatalities per million in 1982. How do you interpret these results and how do they compare to the previous ones? Now include the additional covariates that we have been using in this model. How do you interpret these results and how do they compare to the previous ones?
Comment on your results from parts 1-9. Which, if any, of these are you most inclined to interpret as a reasonable estimate of the (average) causal effect of mandatory jail policies on alcohol related policies?