Homework 4

\[ \newcommand{\E}{\mathbb{E}} \renewcommand{\P}{\textrm{P}} \newcommand{\L}{\textrm{L}} \newcommand{\F}{\textrm{F}} \newcommand{\var}{\textrm{var}} \newcommand{\cov}{\textrm{cov}} \newcommand{\corr}{\textrm{corr}} \newcommand{\Var}{\mathrm{Var}} \newcommand{\Cov}{\mathrm{Cov}} \newcommand{\Corr}{\mathrm{Corr}} \newcommand{\sd}{\mathrm{sd}} \newcommand{\se}{\mathrm{s.e.}} \newcommand{\T}{T} \newcommand{\indicator}[1]{\mathbb{1}\{#1\}} \newcommand{\independent}{\mathrel{\perp\!\!\!\perp}} \newcommand{\N}{\mathcal{N}} \newcommand{\ATT}{\text{ATT}} \newcommand{\ATE}{\text{ATE}} \]

Due: At the start of class on Thursday, April 2. Please turn in a hard copy.

Question 1 For this problem, we will make the unconfoundedness assumption

\[ \big(Y(0), Y(1)\big) \independent D | X \]

but we will target \(\ATE\) rather than \(\ATT\) as we did in class.

  1. Explain what the overlap assumption is in this context.

  2. Show that, under the unconfoundedness assumption, \(\ATE\) is identified and provide an expression for it. Does your result use the overlap assumption? Explain.

  3. Derive an alternative expression for \(\ATE\) that leads to a propensity score re-weighting estimator for \(\ATE\).

  4. Derive an alternative expression for \(\ATE\) that leads to a doubly robust AIPW estimator for \(\ATE\). Prove that the estimator that you get from this expression is indeed doubly robust.

Question 2: For this problem, we will be interested in computing an estimate of the \(\ATT\) of a job training program. You can download the data here and download a description here. For this problem, the outcome of interest is re78, the treatment is train, and suppose that unconfoundedness holds after conditioning on age, educ, black, hisp, married, re75, and unem75.

  1. Given our discussion in class, we know that if we believe unconfoundedness and that untreated potential outcomes are linear in covariates, then we have that

    \[\begin{align*} \ATT = \E[Y|D=1] - \E[X'|D=1]\beta_0 \end{align*}\]

    where \(\beta_0\) can be estimated from the regression of \(Y\) on \(X\) using untreated observations. Estimate \(\ATT\) based on the above expression for it and report your result.

  2. Show that \(\sqrt{n}(\widehat{\ATT} - \ATT) \xrightarrow{d} \N(0,V)\) and provide an expression for \(V\). Based on this result, provide standard errors for your estimate of \(\ATT\).

  3. Use the bootstrap to compute standard errors for your estimate of \(\ATT\). How do these compare to the standard errors that you reported previously?

  4. Run a regression of \(Y\) on \(D\) and \(X\) (where \(X\) includes the same additional variables as above). Compare the coefficient on \(D\) to the estimate of \(\ATT\). How similar are they? What about their standard errors? Do you have any comment on these results?

  5. Calculate an estimate of \(\ATT\) using propensity score re-weighting (as we discussed in class). You can estimate the propensity score model using logit (it is ok to use the glm function for this). Compute standard errors using the bootstrap and compare your results to the previous estimates.

  6. Calculate an estimate of \(\ATT\) using the doubly robust AIPW estimator that we discussed in class. Again, you can estimate the propensity using logit from the glm function. Compute standard errors using the bootstrap and compare your results to the previous estimates.

  7. Calculate an estimate of \(ATT\) using machine learning. For this problem, use the algorithm we discussed in class to estimate the \(ATT\) using machine learning. In particular, I’d like for you to use the ranger package to estimate the propensity score and the outcome regression model. Compute standard errors using the bootstrap and compare your results to the previous estimates.

  8. The data in this problem comes from Lalonde (1986), which is one of the most influential papers in the causal inference literature. The data that we used in the previous parts was observational data, but Lalonde also had access to experimental data on the same job training program, where people were randomly assigned to receive the job training program or not. You can find the experimental data here. Explain how you can use the experimental data to estimate the \(\ATT\) of the job training program. Is this estimate more or less credible than the estimates that you got from the observational data? Is your estimate from the experimental data close to any of the estimates that you got from the observational data? What do you make of this?

Question 3 Prove the following result from the course note about interpreting \(\alpha\) from the following regression: \(Y = \alpha D + X'\beta + e\). You can use the decompositions that we discussed in class as a starting point. Prove the result for each case mentioned below separately and also show the result about the weights in case (i).

Suppose that unconfoundedness and overlap both hold. In addition, suppose that either (i) \(p(X) = \L(D|X)\) or (ii) \(\E[Y|X,D=0] = \L_0(Y|X)\), then \[\begin{align*} \alpha &= \E\left[w(D,X) CATE(X) \right] \end{align*}\] where \(w(D,X)\) are defined above and have mean 1. In addition, if condition (i) holds (that \(p(X) = \L(D|X)\)), then the weights are non-negative.