Due: At the start of class on Tuesday, April 18. Please turn in a hard copy.

Textbook Questions: Hansen 17.2

Additional Question 1: For this problem, we will be interested in computing an estimate of the \(ATT\) of a job training program. You can download the data here and download a description here. For this problem, the outcome of interest is re78, the treatment is train, and suppose that unconfoundedness holds after conditioning on age, educ, black, hisp, married, re75, and unem75.

Additional Question 2: This question uses the same data/setup as for the job training problem from the previous question. We will be interested in computing an estimate of the \(ATT\) of a job training program. For this problem, the outcome of interest is re78, the treatment is train, and suppose that unconfoundedness holds after conditioning on age, educ, black, hisp, married, re75, and unem75.

For this problem, use the algorithm we discussed in class to estimate the \(ATT\) using machine learning. In particular, I’d like for you to use the randomForest package to estimate the propensity score and the outcome regression model. How does this estimate compare to the one from the previous problem? [You do not need to report a standard error for the estimate, just the point estimate.]

Additional Question 3: Suppose you are interested in estimating the \(ATT\). You also have access to two periods of panel data where no units were treated in the first period and some units became treated in the second period. Further, suppose you are willing to assume that \(\textrm{E}[\Delta Y_t(0) | D=1] = \textrm{E}[\Delta Y_t(0) | D=0]\). Your friend suggests running the following regression \[\begin{align*} Y_{it} = \theta_t + \eta_i + \alpha D_{it} + e_{it} \end{align*}\] and interpreting \(\alpha\) as the \(ATT\). This friend also suggests that this approach is robust to treatment effect heterogeneity. Are they correct? Explain.