\[ \newcommand{\E}{\mathbb{E}} \]

**Due:** At the start of class on Tuesday, May 3. Please turn in a hard copy.

**Textbook Questions:** Hansen 17.2

**Additional Question 1:** Suppose you are interested in estimating the \(ATT\). You also have access to two periods of data where no units were treated in the first period and some units became treated in the second period. Further, suppose you are willing to assume that \(\E[\Delta Y_t(0) | D=1] = \E[\Delta Y_t(0) | D=0]\). Your friend suggests running the following regression \[\begin{align*}
Y_{it} = \theta_t + \eta_i + \alpha D_{it} + e_{it}
\end{align*}\] and interpreting \(\alpha\) as the \(ATT\). This friend also suggests that this approach is robust to treatment effect heterogeneity. Are they correct? Explain.

**Additional Question 2:**

This question uses the same data/setup as for the job training problem from the previous homework. We will be interested in computing an estimate of the \(ATT\) of a job training program. You can download the data here and download a description here. For this problem, the outcome of interest is `re78`

, the treatment is `train`

, and suppose that unconfoundedness holds after conditioning on `age`

, `educ`

, `black`

, `hisp`

, `married`

, `re75`

, and `unem75`

.

For this problem, use the algorithm we discussed in class to estimate the \(ATT\) using machine learning. In particular, I’d like for you to use the `randomForest`

package to estimate the propensity score and the outcome regression model.

**Additional Question 3:**

For this problem, use the data `job_displacement_clean2.RData`

posted here. For this problem, we are interested in estimating the (causal) effect of job displacement on earnings. The key variables in this data are `learn`

(which is the log of earnings and is the outcome variable in this problem) and `first.displaced`

(which contains the time period when an individual becomes displaced) — `first.displaced`

is the variable that is used to form “groups” below. It is set equal to 0 for individuals that are not displaced from their job in any period. (As always, you are not allowed to use built-in `R`

package, such as `plm`

, `fixest`

or `did`

, to compute the results to these questions though you are welcome to compare your results to output from those packages.)

**Hint:** The number of observations is fairly large in this data and, therefore, inverting some matrices can be quite time-consuming. I recommend that you use the `Matrix`

package to deal with the matrices in this question. This package has the ability to deal with “sparse” matrices in an efficient way which is helpful for this problem.

Use the within estimator to estimate the following regression model \[\begin{align*} Y_{it} = \theta_t + \eta_i + \alpha D_{it} + e_{it} \end{align*}\] Report an estimate of \(\alpha\) and its standard error.

Compute difference-in-differences versions of group-time average treatment effects (where you can define group by the time period when an individual becomes displaced from their job). Use individuals who are never displaced from their job as the comparison group.

Aggregate the group-time average treatment effects in part (b) into an overall treatment effect parameter. Use the bootstrap to compute a standard error for it. How do these results compare to the ones from parts (a)?

Compute the weights on underlying group-time average treatment effects that come from the two-way fixed effects regression in part (a). How do these compare to the the weights from part (c).

Aggregate the group-time average treatment effects in part (b) into an event study. Use the bootstrap to compute standard errors, and plot the estimates along with a 95% confidence interval.