8.4 Unconfoundedness

SW 6.8, SW Ch. 9

Unfortunately, we rarely have access to experimental data in applications or the ability to run experiments to evaluate causal effects programs/policies that we’d be interested in studying. For example, it’s hard to imagine convincing a large number of countries to randomly assign their interest rate, minimum wage, or immigration policies though these are all policies that economists would be interested in thinking about the causal effect of.

For the remainder of this chapter, we’ll discuss common approaches to causal inference when the treatment is not randomly assigned. We’ll start with what is probably the most common approach: unconfoundedness.

Unconfoundedness Assumption: \[\begin{align*} (Y(1),Y(0)) \perp D | X \end{align*}\] You can think of this as saying that, among individuals with the same covariates \(X\), they have the same distributions of potential outcomes regardless of whether or not they participate in the treatment. Note that the distribution of \(X\) is still allowed to be different between the treated and untreated groups. In other words, after you condition on covariates, there is nothing special (in terms of the distributions of potential outcomes) about the group that participates in the treatment relative to the group that doesn’t participate in the treatment.

This is potentially a strong assumption. In order to believe this assumption, you need to believe that untreated individuals with the same characteristics can deliver, on average, the outcome that individuals in the treated group would have experienced if they had not participated in the treatment. In math, you can write this as \[\begin{align*} \mathbb{E}[Y(0) | X, D=1] = \mathbb{E}[Y(0) | X, D=0] \end{align*}\]

If you are willing to believe this assumption, then you can recover the \(ATT\). There are a few different ways that you could implement this. Probably the most common (and convenient) way is to link this condition to a regression (just like we did in the previous section on experiments).

To do this, let’s continue to make the treatment effect homogeneity assumption as above. In addition, let’s assume a linear model for untreated potential outcomes \[\begin{align*} Y(0) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_3 + U \end{align*}\] and unconfoundedness implies that \(\mathbb{E}[U|X_1,X_2,X_3,D] = 0\) (the conditioning on \(D\) is the unconfoundedness part). Now, recalling the definition of the observed outcome, we can write \[\begin{align*} Y_i &= D_i Y_i(1) + (1-D_i) Y_i(0) \\ &= D_i (Y_i(1) - Y_i(0)) + Y_i(0) \\ &= D_i \alpha + \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_3 + U_i \end{align*}\] which suggests running the regression of observed \(Y\) on \(X_1,X_2,X_3,\) and \(D\) and interpreting the estimate of \(\alpha\) as the causal effect of participating in the treatment. In practice, this will be very similar to what we have done before — so the process would not be hard, but convincing someone (or even yourself) that unconfoundedness holds will be the bigger issue here.

As a final comment, the assumption of treatment effect homogeneity is not quite so innocuous here. It turns out that you can show that, in the presence of treatment effect heterogeneity, \(\alpha\) will be equal to a weighted average of individual treatment effects, but the weights can sometimes be “strange”. There are methods that are robust to treatment effect heterogeneity (they are beyond the scope of the current class, but they are not “way” more difficult than what we are doing here). That said, in my experience, the regression estimators (under treatment effect homogeneity) tend to deliver similar estimates to alternative estimators that are robust to treatment effect heterogeneity at least in the setup considered in this section.