Solutions to Final Extra Questions

Ch. 18 Extra Question 2

Most researchers give up on trying to estimate the individual-level effect of the treatment because it is too difficult to do it (not because it is uninteresting). For each individual, we only observe one of their two potential outcomes, and somehow “figuring out” the other one is very difficult. Instead, we typically target an average treatment effect, where instead of getting each units counterfactual outcome exactly right, we only need to be right on average, which is much easier.

As one additional comment, note that if we have random assignment we can recover either the \(ATE\) or \(ATT\), but notice that, even in this best case scenario, we still would not generally be able to recover the individual-level treatment effect.

Ch. 18 Extra Question 6

The key assumption for difference-in-differences is the parallel trends assumption. In math: \(\mathbb{E}[\Delta Y(0) | G=1] = \mathbb{E}[\Delta Y(0) | G=0]\). In words this means that, if neither group had been treated, they would have experienced the same average change in outcomes over time.

This assumption could be preferable to unconfoundedness especially in settings where the researcher has access to panel data and there are unobserved variables that the researcher would like to control for.

Ch. 18 Extra Question 7

Part (a)

The assumption mentioned here is that unconfoundedness holds after conditioning on years of education. In other words, that comparing earnings of unionized and non-unionized workers with the same years of education results can tell us about the causal effect of unionization on earnings. This assumption is probably not very plausible, as there are likely many other differences between unionized and non-unionized workers that also affect earnings (e.g., different industries, different locations, different demographics, etc.).

Part (b)

To estimate the \(ATT\) of unionization on earnings under the assumption in part (a) using regression adjustment, we can do the following:

Step 1: Run a regression of earnings on years of education using only non-unionized workers. From this regression, get predicted untreated potential outcomes for all workers, which we will denote \(\hat{Y}_i(0)\).

Step 2: Estimate \(ATT\) by \(\widehat{ATT} = \frac{1}{n_1} \sum_{i=1}^n D_i Y_i - \frac{1}{n_1} \sum_{i=1}^n D_i \hat{Y}_i(0)\). This expression amounts to just calculating the average outcome experienced by union workers and subtracting the average predicted untreated potential outcome for those same workers.

Part (c)

An alternative to regression adjustment is to just use a single regression. This requires the extra assumption of treatment effect homogeneity. In particular, the regression to run here is

\[Y_i = \beta_0 + \alpha D_i + \beta_1 \text{Educ}_i + U_i\]

where \(\alpha\), the coefficient on the treatment indicator, estimates the causal effect of unionization on earnings under the unconfoundedness assumption from part (a).

Part (d)

For the placebo test, you can use either regression adjustment or the single regression approach. In either case, the main difference from parts (b) and (c) is that you should use the outcome in the first period (before any workers are unionized) instead of the outcome in the second periods. In particular, for regression adjustment,

Step 1: Run a regression of first period earnings on years of education using only non-unionized workers. From this regression, get predicted untreated potential outcomes for all workers, which we will denote \(\hat{Y}_{i, t=1}(0)\).

Step 2: Estimate \(ATT_{pre}\) by \(\widehat{ATT}_{pre} = \frac{1}{n_1} \sum_{i=1}^n G_i Y_{i, t=1} - \frac{1}{n_1} \sum_{i=1}^n G_i \hat{Y}_{i, t=1}(0)\), where \(G_i\) is an indicator for a worker becoming unionized in the second period. This expression amounts to just calculating the average first period outcome experienced by union workers and subtracting the average predicted untreated potential outcome for those same workers.

If the estimate of \(ATT_{pre}\) is close to zero, that provides some evidence in favor of the unconfoundedness assumption from part (a). If the estimate is far from zero, that suggests that the assumption may not be valid.

For the single regression approach, you would run the regression

\[$Y_{i, t=1} = \beta_0 + \alpha G_i + \beta_1 \text{Educ}_i + U_i\]

where \(\alpha\) estimates the placebo effect of unionization on earnings in the first period. If the unconfoundedness assumption from part (a) holds, we would expect this estimate to be close to zero.

Part (e)

Lagged outcome unconfoundedness says that \(\big(Earnings_{t=2}(1), Earnings_{t=2}(0)\big) \perp \!\!\! \perp G | Education, Earnings_{t=1}\). In words, this means that, after controlling for first period earnings and education, the potential outcomes in the second period are independent of whether a worker becomes unionized in the second period. This assumption is likely to be more plausible than the assumption in part (a) as workers who have the same-pretreatment earnings and education are likely to be more similar to each other than workers who just have the same education, though there may still be other differences between unionized and non-unionized workers that affect earnings such as industry, location, demographics, etc. which could result in different potential outcomes for the two groups even after controlling for education and lagged earnings.

Part (f)

To estimate the \(ATT\) of unionization on earnings under the assumption in part (e) using regression adjustment, we can do the following:

Step 1: Run a regression of second period earnings on years of education and first period earnings using only non-unionized workers. From this regression, get predicted untreated potential outcomes for all workers, which we will denote \(\hat{Y}_{i, t=2}(0)\).

Step 2: Estimate \(ATT\) by \(\widehat{ATT} = \frac{1}{n_1} \sum_{i=1}^n G_i Y_{i, t=2} - \frac{1}{n_1} \sum_{i=1}^n G_i \hat{Y}_{i, t=2}(0)\). This expression amounts to just calculating the average second period outcome experienced by union workers and subtracting the average predicted untreated potential outcome for those same workers.

To estimate the \(ATT\) using the single regression approach, you would run the regression

\[Y_{i, t=2} = \beta_0 + \alpha G_i + \beta_1 \text{Educ}_i + \beta_2 Y_{i, t=1} + U_i\]

where \(\alpha\), the coefficient on the treatment indicator, estimates the causal effect of unionization on earnings under the lagged outcome unconfoundedness assumption from part (e).

Notice how mechanically similar this is to parts (b) and (c), with the only substantive difference being that we now included lagged earnings as an additional covariate.

Part (g)

Parallel trends says that, in the absence of treatment, for workers with the same education, the average change in earnings from period 1 to period 2 would be the same for workers who become unionized and those who do not. In math: \(\mathbb{E}[\Delta Earnings_{t=2}(0) | Education, G=1] = \mathbb{E}[\Delta Earnings_{t=2}(0) | Education, G=0]\). This assumption is likely more plausible than the unconfoundedness assumption for part (a). It is unclear if it is more plausible than the lagged outcome unconfoundedness assumption from part (e). It would be attractive if we think that unionized and non-unionized workers differ in ways that are hard for us to observed (e.g., differences in ability, motivation, etc.) and that (i) these unobserved variables do not vary over time and (ii) the effect of these unobserved variables on earnings is constant over time.

Part (h)

To estimate the \(ATT\) of unionization on earnings under the parallel trends assumption using a regression adjustment version of difference-in-differences, we can do the following:

Step 1: Run a regression of the change in earnings from period 1 to period 2 on years of education using only non-unionized workers. From this regression, get predicted changes in untreated potential outcomes for all workers, which we will denote \(\widehat{\Delta Y_i(0)}\).

Step 2: Estimate \(ATT\) by \(\widehat{ATT} = \frac{1}{n_1} \sum_{i=1}^n G_i \Delta Y_i - \frac{1}{n_1} \sum_{i=1}^n G_i \widehat{\Delta Y_i(0)}\). This expression amounts to just calculating the average change in outcomes experienced by union workers and subtracting the average predicted change in untreated potential outcomes for those same workers.

To estimate the \(ATT\) using the single regression approach, you would run the regression

\[\Delta Y_i = \beta_0 + \alpha G_i + \beta_1 \text{Educ}_i + U_i\]

where \(\alpha\), the coefficient on the treatment indicator, estimates the causal effect of unionization on earnings under the parallel trends assumption.

Notice again that, mechanically, this is very similar to parts (b) and (c), with the main difference being that we are now working with changes in outcomes instead of levels of outcomes.

Part (i)

With only two periods of panel data, it is not possible to use a placebo test for lagged outcome unconfoundedness or parallel trends, because we already use both periods of data to estimate the \(ATT\). However, if we have access to a third period (for simplicity, let’s call it \(t=0\)), we can implement either the lagged outcome unconfoundedness or difference-in-differences approaches discussed above but with data from periods 0 and 1 (before the treatment) rather than periods 1 and 2. This give a placebo estimate of the effect of unionization on earnings in period 1, which should be close to zero if either assumption holds.

Ch. 18 Extra Question 8

Part (a)

In order to interpret the difference in average earnings for college students relative to non-college students as the \(ATT\) of college attendance on earnings, we would need the following condition to hold:

\[ \big( Y(1), Y(0) \big) \perp \!\!\! \perp D \]

where \(D\) is an indicator for attending college, \(Y(1)\) is a person’s potential earnings if they were to attend college, and \(Y(0)\) is a person’s potential earnings if they were to not attend college. This assumption is a version of unconfoundedness where unconfoundedness holds without conditioning on any covariates. Also, notice that this is exactly the same condition that would arise if college were randomly assigned, but, here, college is not randomly assigned, but this condition would mean that the selection into college is “as good as random” in the sense that potential outcomes are independent of college attendance.

This assumption is very implausible, as there are likely many differences between college students and non-college students that also affect earnings (e.g., different demographics, different abilities, different motivations, different family backgrounds, etc.).

Part (b)

Assumption 1: Relevance - The relevance assumption says that \(\mathrm{P}(D(1) = 1) > \mathrm{P}(D(0) = 1)\), in other words, that the instrument changes the probability of being treated. In the context of our application, trying to use an after-the-fact coin flip as an instrument for college attendance would not satisfy this assumption as the coin flip has no effect on whether or not a person attends college.

Assumption 2: Independence - For independence, we need \(\big( Y(1), Y(0), D(1), D(0) \big) \perp \!\!\! \perp Z\). In words, this means that the instrument is independent of the potential outcomes and potential treatments. In the context of our application, using an after-the-fact coin flip as an instrument for college attendance would satisfy this assumption as the coin flip is independent of potential outcomes and potential treatments.

Assumption 3: Exclusion Restriction - The exclusion restriction says that \(Y(D_i(z), z) = Y(D_i(z))\) for all \(z\). In words, this means that the instrument only affects the outcome through its effect on the treatment. In the context of our application, using an after-the-fact coin flip as an instrument for college attendance would satisfy this assumption as the coin flip does not directly affect earnings.

Assumption 4: Monotonicity - Monotonicity says that \(D_i(1) \geq D_i(0)\), which rules out the existence of defiers. In the context of our application, using an after-the-fact coin flip as an instrument for college attendance would satisfy this assumption as the coin flip does not affect college attendance for anyone, i.e., \(D_i(1) = D_i(0)\) for all \(i\).

Thus, while using an after-the-fact coin does satisfy independence, the exclusion restriction, and monotonicity, it does not satisfy relevance, so it is not a valid instrument for college attendance.

Part (c)

Assumption 1: Relevance - It seems likely that distance to the nearest college would affect the probability of attending college (particularly for community colleges, etc.), so this assumption is likely satisfied.

Assumption 2: Independence - This assumption is more questionable. While distance to the nearest college is arguably “as good as random” in some contexts, it is likely correlated with other factors that affect earnings such as location (e.g., urban vs. rural), local economic conditions, demographics, etc.

Assumption 3: Exclusion Restriction - This assumption is also questionable. While distance to the nearest college likely affects earnings through its effect on college attendance, it may also affect earnings through other channels. For example, people who live closer to colleges may have access to better job opportunities, networking opportunities, etc. that could affect earnings even if they do not attend college.

Assumption 4: Monotonicity - This assumption seems plausible. It is unlikely that there are people who would attend college if they lived farther away from a college but would not attend if they lived closer.

Using distance to the nearest college is actually a very famous instrument for college attendance, but, as you can see from the discussion above, it is debatable about whether or not it satisfies all of the necessary assumptions to be a valid instrument.