Homework 5

Due: At the start of class on Monday, April 8. Please turn in a hard copy.

Textbook Questions: None

Additional Question 1: For this problem, we will be interested in computing an estimate of the \(ATT\) of a job training program. You can download the data here and download a description here. For this problem, the outcome of interest is re78, the treatment is train, and suppose that unconfoundedness holds after conditioning on age, educ, black, hisp, married, re75, and unem75.

Given our discussion in class, we know that if we believe unconfoundedness and that untreated potential outcomes are linear in covariates, then we have that

\[\begin{align*} ATT = \E[Y|D=1] - \E[X'|D=1]\beta \end{align*}\]

where \(\beta\) can be estimated from the regression of \(Y\) on \(X\) using untreated observations. Estimate \(ATT\) based on the above expression for it and report your result.
Show that \(\sqrt{n}(\widehat{ATT} - ATT) \xrightarrow{d} N(0,V)\) and provide an expression for \(V\). Based on this result, provide standard errors for your estimate of \(ATT\).
Use the bootstrap to compute standard errors for your estimate of \(ATT\). How do these compare to the standard errors that you reported previously?
Run a regression of \(Y\) on \(D\) and \(X\) (where \(X\) includes the same additional variables as above). Compare the coefficient on \(D\) to the estimate of \(ATT\). How similar are they? What about their standard errors? Do you have any comment on these results?
Calculate an estimate of \(ATT\) using propensity score re-weighting (as we discussed in class). You can estimate the propensity score model using logit (it is ok to use the glm function for this). Compute standard errors using the bootstrap and compare your results to the previous estimates.
Calculate an estimate of \(ATT\) using the doubly robust approach that we discussed in class. Again, you can estimate the propensity using logit from the glm function. Compute standard errors using the bootstrap and compare your results to the previous estimates.
Finally, calculate an estimate of \(ATT\) using machine learning. For this problem, use the algorithm we discussed in class to estimate the \(ATT\) using machine learning. In particular, I’d like for you to use the randomForest package to estimate the propensity score and the outcome regression model. Compute standard errors using the bootstrap and compare your results to the previous estimates.