Homework 5

Due: At the start of class on Thursday, April 23. Please turn in a hard copy.

Textbook Questions: Hansen 17.2

Additional Question 1: In class, we discussed the cross-fitting estimator for the \(\ATT\) based on the AIPW estimand where the nuisance functions, \(m_0(X)\) and \(p(X)\), were estimated using machine learning. In particular, \[\begin{align*} \widehat{\ATT}(k) = \frac{1}{n_k}\sum_{i\in I_k} \left(\frac{D_i}{\pi} - \frac{(1-D_i)\hat{p}^{-k}(X_i)}{\pi(1-\hat{p}^{-k}(X_i))}\right)(Y_i-\hat{m}^{-k}(X_i)) \quad \text{and}\quad \widehat{\ATT} = \frac{1}{K}\sum_{k=1}^K \widehat{\ATT}(k). \end{align*}\] where \(k\) denotes the \(k\)-th fold, \(I_k\) is the set of indices in the \(k\)-th fold, and \(\hat{m}^{-k}(X)\) and \(\hat{p}^{-k}(X)\) are estimates of \(m_0(X)\) and \(p(X)\) that are obtained using only the data from folds other than the \(k\)-th fold. In class, we showed that \(\sqrt{n}(\widehat{\ATT} - \ATT) \xrightarrow{d} \N(0, \sigma^2)\). To derive this result, we used the bias formula: \[\begin{align*} \E[\widehat{\ATT} - \ATT] = -\frac{1}{\pi}\E\left[ \frac{\big(\hat{m}_0^{-k}(X) - m_0(X)\big)\big(\hat{p}^{-k}(X) - p(X)\big))}{1-\hat{p}^{-k}(X)} \right] \end{align*}\] Provide a proof of the bias formula and explain why this ends up being useful for showing asymptotic normality.

Additional Question 2:

For this problem, use the data job_displacement_clean2.RData posted here. For this problem, we are interested in estimating the (causal) effect of job displacement on earnings. The key variables in this data are learn (which is the log of earnings and is the outcome variable in this problem) and first.displaced (which contains the time period when an individual becomes displaced) — first.displaced is the variable that is used to form “groups” below. It is set equal to 0 for individuals that are not displaced from their job in any period. (As always, you are not allowed to use built-in R package, such as plm, fixest or did, to compute the results to these questions though you are welcome to compare your results to output from those packages.)

Hint: The number of observations is fairly large in this data and, therefore, inverting some matrices can be quite time-consuming. I recommend that you use the Matrix package to deal with the matrices in this question. This package has the ability to deal with “sparse” matrices in an efficient way which is helpful for this problem.

Use the within estimator to estimate the following regression model \[\begin{align*} Y_{it} = \theta_t + \eta_i + \alpha D_{it} + e_{it} \end{align*}\] Report an estimate of \(\alpha\) and its standard error.
Compute difference-in-differences versions of group-time average treatment effects (where you can define group by the time period when an individual becomes displaced from their job). Use individuals who are never displaced from their job as the comparison group.
Aggregate the group-time average treatment effects in part (b) into an overall treatment effect parameter. Use the bootstrap to compute a standard error for it. How do these results compare to the ones from parts (a)?

Aggregate the group-time average treatment effects in part (b) into an event study. Use the bootstrap to compute standard errors, and plot the estimates along with a 95% confidence interval.