# Modern Approaches to Difference-in-Differences

Session 2: Relaxing Parallel Trends

Brantly Callaway

University of Georgia

## Outline

1. Introduction to Difference-in-Differences

2. Relaxing the Parallel Trends Assumption

1. Including Covariates in the Parallel Trends Assumption

2. Dealing with Violations of Parallel Trends

3. Dealing with More Complicated Treatment Regimes

4. Alternative Identification Strategies

## Notation

Only need a little bit of new notation here:

• $$X_{i,t=2}$$ and $$X_{i,t=1}$$ — time-varying covariates

• $$Z_i$$ — time-invariant covariates

# Limitations of TWFE Regressions

## Limitations of TWFE Regressions

In this setting, it is common to run the following TWFE regression:

$Y_{i,t} = \theta_t + \eta_i + \alpha D_{i,t} + X_{i,t}'\beta + e_{i,t}$

However, there are a number of issues:

Issue 1: Issues related to multiple periods and variation in treatment timing still arise

Issue 2: Hard to allow parallel trends to depend on time-invariant covariates

Issue 3: Hard to allow for covariates that could be affected by the treatment

## Limitations of TWFE Regressions

In this setting, it is common to run the following TWFE regression:

$Y_{i,t} = \theta_t + \eta_i + \alpha D_{i,t} + X_{i,t}'\beta + e_{i,t}$

However, there are a number of issues:

Issue 4: Linearity results in mixing identification and estimation…e.g., with 2 periods \begin{align*} \Delta Y_{i,t} = \Delta \theta_t + \alpha D_{i,t} + \Delta X_{i,t}'\beta + \Delta e_{i,t} \end{align*} $$\implies$$ differencing out unit fixed effects can have implications about what researcher controls for

• This doesn’t matter if model for untreated potential outcomes is truly linear

• However, if we think of linear model as an approximation, this may have meaningful implications.

## Limitations of TWFE Regressions

Even if none of the previous 4 issues apply, $$\alpha$$ will still be equal to a weighted average of underlying (conditional-on-covariates) treatment effect parameters.

• The weights can be negative, and suffer from “weight reversal” (similar to the issue discussed in Słoczyński (2022) in the context of cross-sectional regressions with covaraites)

• In other words, weights $$\alpha$$ is a weighted average of $$ATT(X)$$ where (relative to a baseline of weighting based on the distribution of $$X$$ for the treated group), the weights put larger weight on $$ATT(X)$$ for values of the covariates that are uncommon for the treated group relative to the untreated group and smaller weight on $$ATT(X)$$ for values of the covariates that are common for the treated group relative to the untreated group

See Caetano and Callaway (2023) for more details

The challenging term to deal with in the previous expression for $$ATT$$ is

$\E\Big[ \underbrace{\E[\Delta Y_i(0) | X_i, D_i=0]}_{=:m_0(X_i)} \Big| D_i=1\Big]$

The most direct way to proceed is by proposing a model for $$m_0(X_i)$$.

This expression suggests a regression adjustment estimator. For example, if we assume that $$m_0(X_i) = X_i'\beta_0$$, then we have that

$ATT = \E[\Delta Y_i | D_i=1] - \E[X_i'\beta_0|D_i=1]$

and we can estimate the $$ATT$$ by

• Step 1: Estimate $$\beta_0$$ using untreated group

• Step 2: Compute predicted values for treated units: $$X_i'\hat{\beta}_0$$

• Step 3: Compute $$\widehat{ATT}$$ by taking the difference between the difference between the average change in outcomes for treated units and the average predicted outcomes for treated units

## Covariate Balancing

An alternative approach is to try to balance the distribution of covariates between the treated and untreated groups.

Momentarily, suppose that the distribution of $$X_i$$ was the same for both groups, then \begin{align*} \E\Big[ \E[\Delta Y_i(0) | X_i, D_i=0 ] \Big| D_i=1\Big] &= \E\Big[ \E[\Delta Y_i(0) | X_i, D_i=0 ] \Big| D_i=0\Big] \\ &= \E[\Delta Y_i(0) | D_i=0] \end{align*} $$\implies$$ (even under conditional parallel trends) we can estimate $$ATT$$ by just directly comparing paths of outcomes for treated and untreated groups.

## Covariate Balancing

In general, we should not expect the distribution of covariates to be the same across groups.

However the idea of covariate balancing is to come up with balancing weights $$\nu_0(X_i)$$ such that the distribution of $$X_i$$ is the same in the untreated group as it is in the treated group after applying the balancing weights. Then we would have that (from the second term above) \begin{align*} \E\Big[ \E[\Delta Y_i(0) | X_i, D_i=0 ] \Big| D_i=1\Big] &= \E\Big[ \nu_0(X_i) \E[\Delta Y_i(0) | X_i, D_i=0 ] \Big| D_i=0\Big] \\ &= \E[\nu_0(X_i) \Delta Y_i(0) | D_i=0] \end{align*} where the first equality is due to balancing weights and the second by the law of iterated expectations.

## Propensity Score Re-Weighting

The most common way to re-weight is based on the propensity score, you can show: \begin{align*} \nu_0(x) = \frac{p(x)(1-p)}{(1-p(x))p} \end{align*} where $$p(x) = \P(D_i=1|X_i=x)$$ and $$p=\P(D_i=1)$$.

• This is the approach suggested in Abadie (2005). In practice, you need to estimate the propensity score. The most common choices are probit or logit.

For estimation:

• Step 1: Estimate the propensity score (typically logit or probit)

• Step 2: Compute the weights, using the estimated propensity score

• Step 3: Compute $$\widehat{ATT}$$ by taking the difference between the average change in outcomes for treated units and the weighted average of the change in outcomes over time for the untreated group

## Doubly Robust

Alternatively, you can show

$ATT = \E\left[ \Delta Y_{i,t} - \E[\Delta Y_{i,t} | X_i, D_i=0] \big| D=1\right] - \E\left[ \frac{p(X_i)(1-p)}{(1-p(X_i))p} \big(\Delta Y_{i,t} - \E[\Delta Y_{i,t} | X_i, D_i=0]\big) \Big| D_i = 0\right]$

This requires estimating both $$p(X_i)$$ and $$\E[\Delta Y_i|X_i,D_i=0]$$.

• This expression for $$ATT$$ is doubly robust. This means that, it will deliver consistent estimates of $$ATT$$ if either the model for $$p(X_i)$$ or for $$\E[\Delta Y_i|X_i,D_i=0]$$.
• In my experience, doubly robust estimators perform much better than either the regression or propensity score weighting estimators
• This also provides a connection to estimating $$ATT$$ under conditional parallel trends using machine learning for $$p(X_i)$$ and $$\E[\Delta Y_i|X_i,D_i=0]$$ (see: Chang (2020))

## Multiple Time Periods and Variation in Treatment Timing

Conditional Parallel Trends with Multiple Periods

For all groups $$g \in \bar{\mathcal{G}}$$ (all groups except the never-treated group) and for all time periods $$t=2, \ldots, T$$,

$\E[\Delta Y_{i,t}(0) | \mathbf{X}_i, Z_i, G_i=g] = \E[\Delta Y_{i,t}(0) | \mathbf{X}_i, Z_i, U_i=1]$

where $$\mathbf{X}_i := (X_{i,1},X_{i,2},\ldots,X_{i,T})$$.

Under this assumption, using similar arguments to the ones above, one can show that

$ATT(g,t) = \E\left[ \left( \frac{\indicator{G_i=g}}{p_g} - \frac{p_g(\mathbf{X}_i,Z_i)U_i}{(1-p_g(\mathbf{X}_i,Z_i))p_g}\right)\Big(Y_{i,t} - Y_{i,g-1} - m_{gt}^0(\mathbf{X}_i,Z_i)\Big) \right]$

where $$p_g(\mathbf{X}_i,Z_i) := \P(G_i=g|\mathbf{X}_i,Z_i,\indicator{G_i=g}+U_i=1)$$ and $$m_{gt}^0(\mathbf{X}_i,Z_i) := \E[Y_{i,t}-Y_{i,g-1}|\mathbf{X}_i,Z_i,U_i=1]$$.

## Practical Considerations

Because $$\mathbf{X}_i$$ contains $$X_{i,t}$$ for all time periods, terms like $$m_{gt}^0(\mathbf{X}_i,Z_i)$$ can be quite high-dimensional (and hard to estimate) in many applications. In many cases, it may be reasonable to replace with lower dimensional function $$\mathbf{X}_i$$:

• $$\bar{X}_i$$ — the average of $$X_{i,t}$$ across time periods
• $$X_{i,t}, X_{i,g-1}$$ — the covariates in the current period and base period (this is possible in the pte package currently and may be added to did soon).
• $$X_{i,g-1}$$ — the covariates in the base period (this is the default in did)
• $$(X_{i,t}-X_{i,g-1})$$ — the change in covariates over time

# Empirical Example

## Back to Minimum Wage Example

We’ll allow for path of outcomes to depend on region of the country

# run TWFE regression
twfe_x <- fixest::feols(lemp ~ post | id + region^year,
data=data2)
modelsummary(twfe_x, gof_omit=".*")
(1)
post 0.001
(0.008)

Relative to previous results, this is much smaller and statistically insignificant and is similar to the result in Dube, Lester, and Reich (2010).

## Callaway and Sant’Anna (2021) Results

# callaway and sant'anna including covariates
cs_x <- att_gt(yname="lemp",
tname="year",
idname="id",
gname="G",
xformla=~region,
control_group="nevertreated",
base_period="universal",
data=data2)
cs_x_res <- aggte(cs_x, type="group")
summary(cs_x_res)
cs_x_dyn <- aggte(cs_x, type="dynamic")
ggdid(cs_x_dyn)

## Callaway and Sant’Anna (2021) Results


Call:
aggte(MP = cs_x, type = "group")

Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>

Overall summary of ATT's based on group/cohort aggregation:
ATT    Std. Error     [ 95%  Conf. Int.]
-0.0273        0.0092    -0.0453     -0.0093 *

Group Effects:
Group Estimate Std. Error [95% Simult.  Conf. Band]
2004  -0.0436     0.0192       -0.0863     -0.0010 *
2006  -0.0199     0.0072       -0.0359     -0.0039 *
---
Signif. codes: *' confidence band does not cover 0

Control Group:  Never Treated,  Anticipation Periods:  0
Estimation Method:  Doubly Robust

## Callaway and Sant’Anna (2021) Results

Even more than in the previous case, the results in this case are notably different depending on the estimation strategy.

# run TWFE regression
twfe_x <- fixest::feols(lemp ~ post + lpop + lavg_pay | id + region^year,
data=data2)
modelsummary(twfe_x, gof_omit=".*")
(1)
post −0.022
(0.008)
lpop 1.057
(0.137)
lavg_pay 0.074
(0.079)

# callaway and sant'anna including covariates
cs_x <- att_gt(yname="lemp",
tname="year",
idname="id",
gname="G",
xformla=~region + lpop + lavg_pay,
control_group="nevertreated",
base_period="universal",
est_method="reg",
data=data2)
cs_x_res <- aggte(cs_x, type="group")
summary(cs_x_res)
cs_x_dyn <- aggte(cs_x, type="dynamic")
ggdid(cs_x_dyn)


Call:
aggte(MP = cs_x, type = "group")

Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>

Overall summary of ATT's based on group/cohort aggregation:
ATT    Std. Error     [ 95%  Conf. Int.]
-0.0321        0.0083    -0.0485     -0.0158 *

Group Effects:
Group Estimate Std. Error [95% Simult.  Conf. Band]
2004  -0.0596     0.0200       -0.1030     -0.0161 *
2006  -0.0197     0.0082       -0.0375     -0.0020 *
---
Signif. codes: *' confidence band does not cover 0

Control Group:  Never Treated,  Anticipation Periods:  0
Estimation Method:  Outcome Regression

## Inverse Propensity Score Weighting

# callaway and sant'anna including covariates
cs_x <- att_gt(yname="lemp",
tname="year",
idname="id",
gname="G",
xformla=~region + lpop + lavg_pay,
control_group="nevertreated",
base_period="universal",
est_method="ipw",
data=data2)
cs_x_res <- aggte(cs_x, type="group")
summary(cs_x_res)
cs_x_dyn <- aggte(cs_x, type="dynamic")
ggdid(cs_x_dyn)

## Inverse Propensity Score Weighting


Call:
aggte(MP = cs_x, type = "group")

Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>

Overall summary of ATT's based on group/cohort aggregation:
ATT    Std. Error     [ 95%  Conf. Int.]
-0.0313        0.0077    -0.0464     -0.0162 *

Group Effects:
Group Estimate Std. Error [95% Simult.  Conf. Band]
2004  -0.0514     0.0191       -0.0915     -0.0114 *
2006  -0.0222     0.0078       -0.0387     -0.0057 *
---
Signif. codes: *' confidence band does not cover 0

Control Group:  Never Treated,  Anticipation Periods:  0
Estimation Method:  Inverse Probability Weighting

## Doubly Robust

# callaway and sant'anna including covariates
cs_x <- att_gt(yname="lemp",
tname="year",
idname="id",
gname="G",
xformla=~region + lpop + lavg_pay,
control_group="nevertreated",
base_period="universal",
est_method="dr",
data=data2)
cs_x_res <- aggte(cs_x, type="group")
summary(cs_x_res)
cs_x_dyn <- aggte(cs_x, type="dynamic")
ggdid(cs_x_dyn)

## Doubly Robust


Call:
aggte(MP = cs_x, type = "group")

Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>

Overall summary of ATT's based on group/cohort aggregation:
ATT    Std. Error     [ 95%  Conf. Int.]
-0.0317        0.0077    -0.0467     -0.0166 *

Group Effects:
Group Estimate Std. Error [95% Simult.  Conf. Band]
2004  -0.0509     0.0201       -0.0921     -0.0096 *
2006  -0.0230     0.0072       -0.0378     -0.0083 *
---
Signif. codes: *' confidence band does not cover 0

Control Group:  Never Treated,  Anticipation Periods:  0
Estimation Method:  Doubly Robust

## Side Results

Show some other things that you can do that may be relevant in applications (though not related to including covariates)

1. Change the type of base period - “varying” vs. “universal”

2. Change the comparison group - consider larger comparison groups such as not-yet-treated

3. Allow for anticipation

## Change base period

# callaway and sant'anna including covariates
cs_x <- att_gt(yname="lemp",
tname="year",
idname="id",
gname="G",
xformla=~region + lpop + lavg_pay,
control_group="nevertreated",
base_period="varying",
est_method="dr",
data=data2)
cs_x_res <- aggte(cs_x, type="group")
summary(cs_x_res)
cs_x_dyn <- aggte(cs_x, type="dynamic")
ggdid(cs_x_dyn)

## Change base period


Call:
aggte(MP = cs_x, type = "group")

Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>

Overall summary of ATT's based on group/cohort aggregation:
ATT    Std. Error     [ 95%  Conf. Int.]
-0.0317         0.008    -0.0473      -0.016 *

Group Effects:
Group Estimate Std. Error [95% Simult.  Conf. Band]
2004  -0.0509     0.0197       -0.0930     -0.0087 *
2006  -0.0230     0.0073       -0.0387     -0.0074 *
---
Signif. codes: *' confidence band does not cover 0

Control Group:  Never Treated,  Anticipation Periods:  0
Estimation Method:  Doubly Robust

## Change comparison group

# callaway and sant'anna including covariates
cs_x <- att_gt(yname="lemp",
tname="year",
idname="id",
gname="G",
xformla=~region + lpop + lavg_pay,
control_group="notyettreated",
base_period="universal",
est_method="dr",
data=data2)
cs_x_res <- aggte(cs_x, type="group")
summary(cs_x_res)
cs_x_dyn <- aggte(cs_x, type="dynamic")
ggdid(cs_x_dyn)

## Change comparison group


Call:
aggte(MP = cs_x, type = "group")

Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>

Overall summary of ATT's based on group/cohort aggregation:
ATT    Std. Error     [ 95%  Conf. Int.]
-0.0312        0.0075    -0.0459     -0.0165 *

Group Effects:
Group Estimate Std. Error [95% Simult.  Conf. Band]
2004  -0.0493     0.0195       -0.0886     -0.0100 *
2006  -0.0230     0.0074       -0.0379     -0.0082 *
---
Signif. codes: *' confidence band does not cover 0

Control Group:  Not Yet Treated,  Anticipation Periods:  0
Estimation Method:  Doubly Robust

## Allow for anticipation

# callaway and sant'anna including covariates
cs_x <- att_gt(yname="lemp",
tname="year",
idname="id",
gname="G",
xformla=~region + lpop + lavg_pay,
control_group="nevertreated",
base_period="universal",
est_method="dr",
anticipation=1,
data=data2)
cs_x_res <- aggte(cs_x, type="group")
summary(cs_x_res)
cs_x_dyn <- aggte(cs_x, type="dynamic")
ggdid(cs_x_dyn)

## Allow for anticipation


Call:
aggte(MP = cs_x, type = "group")

Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>

Overall summary of ATT's based on group/cohort aggregation:
ATT    Std. Error     [ 95%  Conf. Int.]
-4e-04        0.0093    -0.0186      0.0177

Group Effects:
Group Estimate Std. Error [95% Pointwise  Conf. Band]
2006   -4e-04     0.0095          -0.019      0.0181
---
Signif. codes: *' confidence band does not cover 0

Control Group:  Never Treated,  Anticipation Periods:  1
Estimation Method:  Doubly Robust

## Covariates Affected by the Treatment

In some applications, we may want to control for covariates that themselves could be affected by the treatment

• Classical examples in labor economics: A person’s industry, occupation, or union status

• These are often referred to as “bad controls”

You can see a tension here:

• We would like to compare units who, absent being treated, would have had the same (say) union status

• But union status could be affected by the treatment

Most common practice is to just completely drop these covariates from the analysis, but there are alternatives     [More Details]

• Condition on pre-treatment value of bad control

• Treat bad control as an outcome (i.e., use some identification strategy), then feed this into the main analysis as a covariate

## Partial Identification / Sensitivity Analysis

References: Manski and Pepper (2018), Rambachan and Roth (2023)

Two versions of sensitivity analysis in RR:

1. Violations of parallel trends are “not too different” in post-treatment periods from the violations in pre-treatment periods
• Allow for violations of parallel trends up to $$\bar{M}$$ times as large as were observed in any pre-treatment period.
• And we’ll vary $$\bar{M}$$.
2. Violations of parallel trends evolve smoothly
• Allow for linear trend up to $$M$$ times as large as linear trend difference in pre-treatment periods.
• We’ll vary $$M$$

Next slides: show these results in minimum wage application for the “on impact” effect of the treatment

# Appendix

## Understanding Double Robustness

To understand double robustness, we can rewrite the expression for $$ATT$$ as \begin{align*} ATT = \E\left[ \frac{D_i}{p} \Big(\Delta Y_i - m_0(X_i)\Big) \right] - \E\left[ \frac{p(X_i)(1-D_i)}{(1-p(X_i))p} \Big(\Delta Y_i - m_0(X_i)\Big)\right] \end{align*}

The first term is exactly the same as what comes from regression adjustment

• If we correctly specify a model for $$m_0(X_i)$$, it will be equal to $$ATT$$.

• If $$m_0(X_i)$$ not correctly specified, then, by itself, this term will be biased for $$ATT$$

The second term can be thought of as a de-biasing term

• If $$m_0(X_i)$$ is correctly specified, it is equal to 0

• If $$p(X_i)$$ is correctly specified, it reduces to $$\E[\Delta Y_{i,t}(0) | D_i=1] - \E[m_0(X_i)|D_i=1]$$ which both delivers counterfactual untreated potential outcomes and removes the (possibly misspecified) second term from the first equation

Back

# Covariates Affected by the Treatment

## Motivation

So far, our discussion has been for the case where the time-varying covariates evolve exogenously.

• Many (probably most) covariates fit into this category: in the minimum wage example, a county’s population probably fits here.

But in other cases, there can exist covariates that we would like to include in the parallel trends assumption that could be affected by the treatment (this type of covariate is often referred to as a bad control.

• Example: Job displacement
• Path of earnings (in the absence of job displacement) likely depends on occupation, industry, and/or union status, which could all be affected by job displacement

The traditional approach in empirical work is to completely drop covariates that could have been affected by the treatment.

• Not clear if this is the right idea though…

To wrap our heads around this, let’s go back to the case with two time periods.

Define treated and untreated potential covariates: $$X_{i,t}(1)$$ and $$X_{i,t}(0)$$. Notice that in the “textbook” two period setting, we observe $X_{i,t=2} = D_i X_{i,t=2}(1) + (1-D_i) X_{i,t=2}(0) \qquad \textrm{and} \qquad X_{i,t=1} = X_{i,t=1}(0)$

If the covariates are literally not affected by the treatment at all, then we can write the version of conditional parallel trends that we have been using above in terms of potential outcomes:

Conditional Parallel Trends using Untreated Potential Covariates

$\E[\Delta Y_i(0) | X_{i,t=2}(0), X_{i,t=1}(0), Z_i, D_i=1] = \E[\Delta Y_i(0) | X_{i,t=2}(0), X_{i,t=1}(0), Z_i, D_i=0]$

## Option 1: Ignore

One idea is to just ignore that the covariates may have been affected by the treatment:

Alternative Conditional Parallel Trends 1

$\E[\Delta Y_i(0) | { \color{red} X_{\color{red}{i,t=2} } }, X_{i,t=1}(0), Z_i, D_i=1] = \E[\Delta Y_i(0) | { \color{red} X_{\color{red}{i,t=2}} }, X_{i,t=1}(0), Z_i, D_i=0]$

The limitations of this approach are well known (even discussed in MHE), and this is not typically the approach taken in empirical work

Job Displacement Example: You would compare paths of outcomes for workers who left union because they were displaced to paths of outcomes for non-displaced workers who also left union (e.g., because of better non-unionized job opportunity)

## Option 2: Drop

It is more common in empirical work to drop $$X_{i,t}(0)$$ entirely from the parallel trends assumption

Alternative Conditional Parallel Trends 2

$\E[\Delta Y_i(0) | Z_i, D_i=1] = \E[\Delta Y_i(0) | Z_i, D_i=0]$

In my view, this is not attractive either though. If we believe this assumption, then we have basically solved the bad control problem by assuming that it does not exist.

Job Displacement Example: We have now just assumed that path of earnings (absent job displacement) doesn’t depend on union status

## Option 3: Tweak

Perhaps a better alternative identifying assumption is the following one

Alternative Conditional Parallel Trends 3

$\E[\Delta Y_i(0) | X_{i,t=1}(0), Z_i, D_i=1] = \E[\Delta Y_i(0) | X_{i,t=1}(0), Z_i, D_i=0]$

Intuition: Conditional parallel trends holds after conditioning on pre-treatment time-varying covariates that could have been affected by treatment

Job Displacement Example: Path of earnings (absent job displacement) depends on pre-treatment union status, but not untreated potential union status in the second period

What to do: Since $$X_{i,t=1}(0)$$ is observed for all units, we can immediately operationalize this assumption use our arguments from earlier (i.e., the ones without bad controls)

• This is difficult to operationalize with a TWFE regression

• In practice, you can just include the bad control among other covariates in did`

## Option 4: Extra Assumptions

Going back to the original conditional parallel trends assumption (that include both $$X_{i,t=2}(0)$$ and $$X_{i,t=1}(0)$$)…Using the same sort of arguments as for regression adjustment earlier, it follows that

$ATT = \E[\Delta Y_i | D_i=1] - \E\Big[ \E[\Delta Y_i(0) | X_{i,t=2}(0), X_{i,t=1}(0), Z_i, D_i=0] \Big| D_i=1\Big]$

The second term is the tricky one. Notice that:

• The inside conditional expectation is identified — we see untreated potential outcomes and covariates for the untreated group

• But the outside expectation is infeasible $$\implies$$ we are stuck (Option 4a)

## Option 4: Dealing with $$X_{i,t=2}(0)$$

Covariate Unconfoundedness Assumption

$X_{i,t=2}(0) \independent D_i | X_{i,t=1}(0), Z_i$

Intuition: For the treated group, the time-varying covariate would have evolved in the same way over time as it actually did for the untreated group, conditional on $$X_{i,t=1}$$ and $$Z_i$$.

• Notice that this assumption only concerns untreated potential covariates $$\implies$$ it allows for $$X_{i,t=2}$$ to be affected by the treatment

• Making an assumption like this indicates that $$X_{i,t=2}(0)$$ is playing a dual role: (i) start by treating it as if it’s an outcome, (ii) have it continue to play a role as a covariate

Under this assumption, can show that we can recover the $$ATT$$:

$ATT = \E[\Delta Y_i | D_i=1] - \E\left[ \E[\Delta Y_i | X_{i,t=1}, Z_i, D_i=0] \Big| D_i=1 \right]$

This is the same expression as in Option 3

In some cases, it may make sense to condition on other additional variables (e.g., the lagged outcome $$Y_{t=1}$$) in the covariate unconfoundedness assumption. In this case, it is still possible to identify $$ATT$$, but it is more complicated

It could also be possible to use alternative identifying assumptions besides covariate unconfoundedness — at a high-level, we somehow need to recover the distribution of $$X_{i,t=2}(0)$$

• e.g., Brown, Butts, and Westerlund (2023)

[Back]

# References

Abadie, Alberto. 2005. “Semiparametric Difference-in-Differences Estimators.” The Review of Economic Studies 72 (1): 1–19.
Brown, Nicholas, Kyle Butts, and Joakim Westerlund. 2023. “Simple Difference-in-Differences Estimation in Fixed-t Panels.”
Caetano, Carolina, and Brantly Callaway. 2023. “Difference-in-Differences When Parallel Trends Holds Conditional on Covariates.”
Caetano, Carolina, Brantly Callaway, Robert Payne, and Hugo Rodrigues. 2023. “Difference-in-Differences with Bad Controls.”
Chang, Neng-Chieh. 2020. “Double/Debiased Machine Learning for Difference-in-Differences Models.” The Econometrics Journal 23 (2): 177–91.
Dube, Arindrajit, T William Lester, and Michael Reich. 2010. “Minimum Wage Effects Across State Borders: Estimates Using Contiguous Counties.” The Review of Economics and Statistics 92 (4): 945–64.
Manski, Charles F, and John V Pepper. 2018. “How Do Right-to-Carry Laws Affect Crime Rates? Coping with Ambiguity Using Bounded-Variation Assumptions.” Review of Economics and Statistics 100 (2): 232–44.
Rambachan, Ashesh, and Jonathan Roth. 2023. “A More Credible Approach to Parallel Trends.” Review of Economic Studies, rdad018.
Roth, Jonathan. 2022. “Pretest with Caution: Event-Study Estimates After Testing for Parallel Trends.” American Economic Review: Insights 4 (3): 305–22.
Słoczyński, Tymon. 2022. “Interpreting OLS Estimands When Treatment Effects Are Heterogeneous: Smaller Groups Get Larger Weights.” The Review of Economics and Statistics 104 (3): 501--509.