Modern Approaches to Difference-in-Differences

Session 5: Alternative Identification Strategies

Brantly Callaway

University of Georgia

Outline

Outline

  1. Introduction to Difference-in-Differences

  2. Including Covariates in the Parallel Trends Assumption

  3. Common Extensions for Empirical Work

  4. Dealing with More Complicated Treatment Regimes

  5. Alternative Identification Strategies

Plan

\(\newcommand{\E}{\mathbb{E}} \newcommand{\E}{\mathbb{E}} \newcommand{\var}{\mathrm{var}} \newcommand{\cov}{\mathrm{cov}} \newcommand{\Var}{\mathrm{var}} \newcommand{\Cov}{\mathrm{cov}} \newcommand{\Corr}{\mathrm{corr}} \newcommand{\corr}{\mathrm{corr}} \newcommand{\L}{\mathrm{L}} \renewcommand{\P}{\mathrm{P}} \newcommand{\independent}{{\perp\!\!\!\perp}} \newcommand{\indicator}[1]{ \mathbf{1}\{#1\} }\) We have been following the high-level strategy of (1) targeting disaggregated parameters and then (2) combining them.

  • This session: go back to staggered treatment setting, but use alternative identification strategies

Examples in this part

  1. Lagged Outcome Unconfoundedness

  2. Change-in-Changes

  3. Interactive Fixed Effects

Main high-level takeaway Many of the insights that we have had previously continue to apply using other identification strategies

Lagged Outcome Unconfoundedness

Introduction to Lagged Outcome Unconfoundedness

Intuition for Lagged Outcome identification strategies is to compare:

  • Observed outcomes for treated units to observed outcomes for untreated units conditional on having the same pre-treatment outcome(s)

Rough explanation: This is a version of unconfoundedness where the most important variable(s) to consider are lagged outcome(s)

Lagged Outcome Unconfoundedness with Two Periods


Lagged Outcome Unconfoundedness Assumption

\[\E[Y_{t=2}(0) | Y_{t=1}(0), D=1] = \E[Y_{t=2}(0) | Y_{t=1}(0), D=0]\]

Explanation: On average, untreated potential outcomes in the 2nd period are the same for the treated group as for the untreated group conditional on having the same pre-treatment outcome

Identification

Under LOU (plus an overlap condition), we can identify \(ATT\): \[ \begin{aligned} ATT &= \E[Y_{t=2} | D=1] - \E[Y_{t=2}(0) | D=1] \hspace{250pt} \end{aligned} \]

Identification

Under LOU (plus an overlap condition), we can identify \(ATT\): \[ \begin{aligned} ATT &= \E[Y_{t=2} | D=1] - \E[Y_{t=2}(0) | D=1] \hspace{250pt}\\ &= \E[Y_{t=2} | D=1] - \E\Big[\E[ Y_{t=2}(0) | Y_{t=1}(0), D=1] \big| D=1\Big] \end{aligned} \]

Identification

Under LOU (plus an overlap condition), we can identify \(ATT\): \[ \begin{aligned} ATT &= \E[Y_{t=2} | D=1] - \E[Y_{t=2}(0) | D=1] \hspace{250pt}\\ &= \E[Y_{t=2} | D=1] - \E\Big[\E[ Y_{t=2}(0) | Y_{t=1}(0), D=1] \big| D=1\Big]\\ &=\E[Y_{t=2} | D=1] - \E\Big[\E[ Y_{t=2}(0) | Y_{t=1}(0), D=0] \big| D=1\Big] \end{aligned} \]

The previous equation is what we will use in estimation, but (for intuition), it is helpful to apply the L.I.E. to the first term so that:

\[ ATT = \E\Big[\E[Y_{t=2} | Y_{t=1}(0), D=1] - \E[ Y_{t=2}(0) | Y_{t=1}(0), D=0] \big| D=1\Big]\]

\(\implies ATT\) is identified can be recovered by the difference in the average outcome for the treated group relative to the average outcome condional on lag for untreated group (this is averaged over the distribution of pre-treatment outcomes for the treated group)

LO Unconfoundedness Estimation

Recall under LO unconfoundedness assumption: \[ATT=\E[Y_{t=2} | D=1] - \E\Big[\underbrace{\E[ Y_{t=2}(0) | Y_{t=1}(0), D=0]}_{\textrm{challenging to estimate}} | D=1\Big]\]

Most straightforward approach (regression adjustment), assume linear model: \(Y_{it=2}(0) = \beta_0 + \beta_1 Y_{it=1}(0) + e_{it}\). Estimate \(\beta_0\) and \(\beta_1\) using set of untreated observations. Then, \[\begin{align*} \widehat{ATT} = \frac{1}{n_1} \sum_{i=1}^n D_i Y_{it=2} - \frac{1}{n_1} \sum_{i=1}^n D_i(\hat{\beta}_0 + \hat{\beta}_1 Y_{it=1}) \end{align*}\]

But also everything else we learned for DiD with covariates applies here: IPW, AIPW, etc.

  • Just replace the covariates with lagged outcomes

Multiple Period Version of LO Unconfoundedness Assumption

Multi-period LOU Assumption

For all groups \(g \in \mathcal{G}\) and for all time periods \(t=2,\ldots,T\), \[ Y_{t}(0) \independent G | Y_{t-1}(0) \]


Applying a similar argument as before recursively

\[ \begin{aligned} ATT(g,t) &= \E[Y_{t}|G=g] - \E\Big[\E[Y_{t} | Y_{g-1}, U=1] \Big| G=g\Big] \end{aligned} \]

Multiple Period Version of LO Unconfoundedness Assumption

Multi-period LOU Assumption

For all groups \(g \in \mathcal{G}\) and for all time periods \(t=2,\ldots,T\), \[ Y_{t}(0) \independent G | Y_{t-1}(0) \]


Applying a similar argument as before recursively

\[ \begin{aligned} ATT(g,t) &= \E[Y_{t}|G=g] - \E\Big[\E[Y_{t} | Y_{g-1}, U=1] \Big| G=g\Big]\\ &= \E\Big[\E[Y_t|Y_{g-1}, G=g] - \E[Y_{t} | Y_{g-1}, U=1] \Big| G=g\Big] \end{aligned} \]

i.e., it’s the same as two period lagged outcome unconfoundedness, except that the base period is now \(g-1\).

[longer explanation]

Minimum Wage Example

devtools::install_github("bcallaway11/pte")
library(pte)
# lagged outcomes identification strategy
lo_res <- pte::pte_default(yname="lemp",
                           tname="year",
                           idname="id",
                           gname="G",
                           data=data2,
                           d_outcome=FALSE,
                           lagged_outcome_cov=TRUE)
summary(lo_res)
did::ggdid(lo_res$att_gt, ylim=c(-.2,0.05))
ggpte(lo_res)

LO Unconfoundedness \(ATT(g,t)\)

LO Unconfoundedness \(ATT^o\)


Overall ATT:  
    ATT    Std. Error     [ 95%  Conf. Int.]  
 -0.061        0.0082    -0.0772     -0.0449 *


Dynamic Effects:
 Event Time Estimate Std. Error   [95%  Conf. Band]  
         -2   0.0140     0.0090 -0.0067      0.0346  
         -1   0.0103     0.0071 -0.0060      0.0266  
          0  -0.0242     0.0077 -0.0419     -0.0064 *
          1  -0.0739     0.0091 -0.0948     -0.0530 *
          2  -0.1290     0.0211 -0.1777     -0.0803 *
          3  -0.1403     0.0249 -0.1976     -0.0830 *
---
Signif. codes: `*' confidence band does not cover 0

LO Unconfoundedness Event Study

Additional References for LOU

  • Ding and Li (2019)

  • Powell, Griffin, and Wolfson (2023)

Change-in-Changes

Introduction to Change-in-Changes

The idea of change-in-changes comes from Athey and Imbens (2006) and builds on work on estimating non-separable production function models. They consider the case where \[\begin{align*} Y_{it}(0) = h_t(U_{it}) \end{align*}\] where \(h_t\) is a nonparametric, time-varying function. To me, it is helpful to think of \(U_{it} = \eta_i + e_{it}\). This model (for the moment) generalizes the model that we used to rationalize parallel trends: \(Y_{it}(0) = \theta_t + \eta_i + e_{it}\).

Additional Conditions:

  1. \(U_{t} \overset{d}{=} U_{t'} | G\). In words: the distribution of \(U_{t}\) does not change over time given a particular group. However, the distribution of \(U_{t}\) can vary across groups.

  2. \(U_{t}\) is scalar

  3. \(h_t\) is stictly monotonically increasing \(\implies\) we can invert it.

  4. Support condition: \(\mathcal{U}_g \subseteq \mathcal{U}_0\) (support of \(U_{t}\) for the treated group is a subset of the support of \(U_{t}\) for the untreated group)

Change-in-Changes Identification

Under the conditions described above, you can show that

\[\begin{align*} ATT(g,t) = \E[Y_{t} | G=g] - \E\Big[Q_{Y_{t}(0)|U=1}\big(F_{Y_{g-1}(0)|U=1}(Y_{g-1}(0))\big) \Big| G=g \Big] \end{align*}\] where \(Q_{Y_{t}(0)|U=1}(\tau)\) is the \(\tau\)-th quantile of \(Y_{t}(0)\) for the never-treated group (e.g., if \(\tau=0.5\), it is the median of \(Y_{t}(0)\) for the never-treated group).

  • (As an interesting side-comment, this is derived in Athey and Imbens 2006, way before recent work on group-time average treatment effects, and it is pretty much exactly analogous to the “first step” that we have been emphasizing)

Intuition for Change-in-Changes

Intuition: Notice that, under parallel trends, we can re-write \[\begin{align*} ATT(g,t) = \E[Y_{t}|G=g] - \E\left[ \Big(\E[Y_{t} | U=1] - \E[Y_{g-1} | U=1]\Big) + Y_{g-1} | G=g \right] \end{align*}\] which we can think of as: compare observed outcomes to, (an average of) taking observed outcomes in the pre-treatment period and accounting for how outcomes change over time in the untreated group across the same periods

For CIC, the intuition is the same, except the way that we “account for” how outcomes change over time during the same periods for the untreated group is a different.

Because these are different transformations, DID and CIC are non-nested approaches.

Comments

CIC is a nice approach in many applications

  • In addition, to recovering \(ATT(g,t)\), it is also possible to recover quantile treatment effect parameters in this setting (these can allow you to more effectively study treatment effect heterogeneity and are closely related to social welfare calculations/comparisons)

Though it is less commonly used in empirical work than DID.

  • Need to estimate quantiles

  • Harder to include covariates (due to needing to estimate quantiles). I think (not 100% sure though) that it is not possible (at least not obvious) if one can do a doubly robust version of CIC.

  • Support conditions can have real bite in some applications

  • Not as much software support

Minimum Wage Application

devtools::install_github("bcallaway11/qte")
library(qte)
# change-in-changes
cic_res <- qte::cic2(yname="lemp",
                     gname="G",
                     tname="year",
                     idname="id",
                     data=data2,
                     boot_type="empirical",
                     cl=4)
summary(cic_res)
ggpte(cic_res)

Minimum Wage Application

Minimum Wage Application


Overall ATT:  
     ATT    Std. Error     [ 95%  Conf. Int.]  
 -0.0591        0.0074    -0.0735     -0.0446 *


Dynamic Effects:
 Event Time Estimate Std. Error [95% Pointwise  Conf. Band]  
         -2   0.0156     0.0088         -0.0016      0.0329  
         -1   0.0092     0.0084         -0.0071      0.0256  
          0  -0.0206     0.0074         -0.0351     -0.0060 *
          1  -0.0734     0.0083         -0.0897     -0.0572 *
          2  -0.1277     0.0213         -0.1694     -0.0859 *
          3  -0.1405     0.0252         -0.1900     -0.0910 *
---
Signif. codes: `*' confidence band does not cover 0

Minmium Wage Application

NULL

Side-Discussion: Quantile Treatment Effects

So far, our discussion has focused on average treatment effect paramters

In some applications, it may be useful to target quantile treatment effect parameters

  • Examples of quantiles: median, 10th percentile, 90th percentile

  • These can be useful for studying how treatments affect different parts of the outcome distribution

    • For example, in labor, we might be particularly interested in how some policy affects the lower part of the income distribution
    • Most social welfare calculations depend on the entire distribution of outcomes \(\implies\) quantile treatment effects can be used to rank policies (Sen (1997),Carneiro, Hansen, and Heckman (2003))

Side-Discussion: Quantile Treatment Effects

Recovering quantile treatment effects using DiD-type identification arguments is not straightforward though:

\[Y_{it}(0) = \theta_t + \eta_i + e_{it}\]

Our arguments have involved taking expectations and difference—these “play nicely” together:

\[\E[Y_t(0) | D=1] = \E[\Delta Y_{t}(0) | D=1] - \E[Y_{t-1}(0) | D=1]\]

However, for quantiles, this decomposition doesn’t generally work

\[Q_{Y_t(0)|D=1}(\tau) \neq Q_{\Delta Y_{t}(0)|D=1}(\tau) - Q_{Y_{t-1}(0)|D=1}(\tau)\]

Side-Discussion: Quantile Treatment Effects

But you can recover \(QTT\) using change-in-changes. In particular,

\[ QTT(g,t)(\tau) = Q_{Y_t|G=g}(\tau) - Q_{Y_{t}(0)|U=1}\Big(F_{Y_{g-1}(0)|U=1}\big(Q_{Y_{g-1}(0)|G=g}(\tau)\big)\Big) \]

And (being careful), you can aggregate this to recover, e.g., a quantile-version of an event study.

Minimum Wage Application

cic_qte10 <- qte::cic2(
  yname = "emp_rate",
  gname = "G2",
  tname = "year",
  idname = "id",
  data = data2,
  boot_type = "empirical",
  cl = 4,
  biters = 100,
  gt_type = "qtt",
  ret_quantile = 0.1
)
ggpte(cic_qte10)

Minimum Wage Application

\(\tau = 0.1\)

Minimum Wage Application

\(\tau = 0.5\)

Minimum Wage Application

\(\tau = 0.9\)

Interactive Fixed Effects

Interactive Fixed Effects

Earlier we discussed that the model that rationalizes parallel trends

\[Y_{it}(0) = \theta_t + \eta_i + e_{it}\]

may be too restrictive. It may not be possible to fully generalize this model (in the sense of \(Y_{it}(0) = h_t(\eta_i) + e_{it}\)), but we can still perhaps relax it to some extent.

An intermediate case is an interactive fixed effects model for untreated potential outcomes: \[\begin{align*} Y_{it}(0) = \theta_t + \eta_i + \lambda_i F_t + e_{it} \end{align*}\]

  • \(\lambda_i\) is often referred to as “factor loading” (notation above implies that this is a scalar, but you can allow for higher dimension)

  • \(F_t\) is often referred to as a “factor”

  • \(e_{it}\) is idioyncratic in the sense that \(\E[e_{t} | G=g] = 0\) for all groups

In our context, though, it makes sense to interpret these as

  • \(\lambda_i\) unobserved heterogeneity (e.g., individual’s unobserved skill)

  • \(F_t\) the time-varying “return” unobserved heterogeneity (e.g., return to skill)

Interactive Fixed Effects

Interactive fixed effects models for untreated potential outcomes generalize some other important cases:

Example 1: Suppose we observe \(\lambda_i\), then this amounts to the regression adjustment version of DID with a time-invariant covariate considered earlier

Example 2: Suppose you know that \(F_t = t\), then this leads to a unit-specific linear trend model: \[\begin{align*} Y_{it}(0) = \theta_t + \eta_i + \lambda_i t + e_{it} \end{align*}\]

To allow for \(F_t\) to change arbitrarily over time is harder…

Example 3: Interactive fixed effects models also provide a connection to “large-T” approaches such as synthetic control and synthetic DID (Abadie, Diamond, and Hainmueller (2010), Arkhangelsky et al. (2021))

  • e.g., one of the motivations of the SCM in ADH-2010 is that (given large-T) constructing a synthetic control can balance the factor loadings in an interactive fixed effects model for untreated potential outcomes

Interactive Fixed Effects

Interactive fixed effects models allow for violations of parallel trends:

\[\begin{align*} \E[\Delta Y_{t}(0) | G=g] = \Delta \theta_t + \E[\lambda|G=g]\Delta F_t \end{align*}\] which can vary across groups.

Example: If \(\lambda_i\) is “ability” and \(F_t\) is increasing over time, then (even in the absence of the treatment) groups with higher mean “ability” will tend to increase outcomes more over time than less skilled groups

How can you recover \(ATT(g,t)\) here?

There are a lot of ideas. Probably the most prominent idea is to directly estimate the model for untreated potential outcomes and impute

  • See Xu (2017), Gobillon and Magnac (2016), and Hsiao and Zhou (2019) for substantial detail on this front

  • For example, Xu (2017) uses Bai (2009) principal components approach to estimate the model. This is a bit different in spirit from what we have been doing before as this argument requires the number of time periods to be “large”

Alternative Approaches with Fixed-T

Very Simple Case:

  • \(T=4\)

  • 3 groups: 3, 4, \(\infty\)

  • We will target \(ATT(3,3) = \E[\Delta Y_3 | G=3] - \underbrace{\E[\Delta Y_3(0) | G=3]}_{\textrm{have to figure out}}\)

In this case, given the IFE model for untreated potential outcomes, we have: \[\begin{align*} \Delta Y_{i3}(0) &= \Delta \theta_3 + \lambda_i \Delta F_3 + \Delta e_{i3} \\ \Delta Y_{i2}(0) &= \Delta \theta_2 + \lambda_i \Delta F_3 + \Delta e_{i2} \\ \end{align*}\]

The last equation implies that \[\begin{align*} \lambda_i = \Delta F_2^{-1}\Big( \Delta Y_{i2}(0) - \Delta \theta_2 - \Delta e_{i2} \Big) \end{align*}\] Plugging this back into the first equation (and combining terms), we have \(\rightarrow\)

Alternative Approaches with Fixed-T

From last slide, combining terms we have that

\[\begin{align*} \Delta Y_{i3}(0) = \underbrace{\Big(\Delta \theta_3 - \frac{\Delta F_3}{\Delta F_2} \Delta \theta_2 \Big)}_{=: \theta_3^*} + \underbrace{\frac{\Delta F_3}{\Delta F_2}}_{=: F_3^*} \Delta Y_{i2}(0) + \underbrace{\Delta e_{i3} - \frac{\Delta F_3}{\Delta F_2} \Delta e_{i2}}_{=: v_{i3}} \end{align*}\]

Now (momentarily) suppose that we (somehow) know \(\theta_3^*\) and \(F_3^*\). Then,

\[\begin{align*} \E[\Delta Y_3(0) | G=3] = \theta_3^* + F_3^* \underbrace{\E[\Delta Y_2(0) | G = 3]}_{\textrm{identified}} + \underbrace{\E[v_3|G=3]}_{=0} \end{align*}\]

\(\implies\) this term is identified; hence, we can recover \(ATT(3,3)\).

Alternative Approaches with Fixed-T

From last slide, combining terms we have that

\[\begin{align*} \Delta Y_{i3}(0) = \underbrace{\Big(\Delta \theta_3 - \frac{\Delta F_3}{\Delta F_2} \Delta \theta_2 \Big)}_{=: \theta_3^*} + \underbrace{\frac{\Delta F_3}{\Delta F_2}}_{=: F_3^*} \Delta Y_{i2}(0) + \underbrace{\Delta e_{i3} - \frac{\Delta F_3}{\Delta F_2} \Delta e_{i2}}_{=: v_{i3}} \end{align*}\]

How can we recover \(\theta_3^*\) and \(F_3^*\)?

Notice: this involves untreated potential outcomes through period 3, and we have groups 4 and \(\infty\) for which we observe these untreated potential outcomes. This suggests using those groups.

  • However, this is not so simple because, by construction, \(\Delta Y_{i2}(0)\) is correlated with \(v_{i3}\) (note: \(v_{i3}\) contains \(\Delta e_{i2} \implies\) they will be correlated by construction)

  • We need some exogenous variation (IV) to recover the parameters \(\rightarrow\)

Alternative Approaches with Fixed-T

There are a number of different ideas here:

  • Make additional assumptions ruling out serial correlation in \(e_{it}\) \(\implies\) can use lags of outcomes as instruments (Imbens, Kallus, and Mao (2021)):

    • But this is seen as a strong assumption in many applications (Bertrand, Duflo, and Mullainathan (2004))
  • Alternatively can introduce covariates and make auxiliary assumptions about them (Callaway and Karami (2023), Brown and Butts (2023), Brown, Butts, and Westerlund (2023))
  • However, it turns out that, with staggered treatment adoption, you can recover \(ATT(3,3)\) essentially for free (Callaway and Tsyawo (2023)).

Alternative Approaches with Fixed-T

In particular, notice that, given that we have two distinct untreated groups in period 3: group 4 and group \(\infty\), then we have two moment conditions:

\[\begin{align*} \E[\Delta Y_3(0) | G=4] &= \theta_3^* + F_3^* \E[\Delta Y_2(0) | G=4] \\ \E[\Delta Y_3(0) | G=\infty] &= \theta_3^* + F_3^* \E[\Delta Y_2(0) | G=\infty] \\ \end{align*}\] We can solve these for \(\theta_3^*\) and \(F_3^*\), then use these to recover \(ATT(3,3)\).

  • The main requirement is that \(\E[\lambda | G=4] \neq \E[\lambda|G=\infty]\) (relevance condition)

  • Can scale this argument up for more periods, groups, and IFEs

  • Relative to other approaches, the main drawback is that can’t recover as many \(ATT(g,t)\)’s; e.g., in this example, we can’t recover \(ATT(3,4)\) or \(ATT(4,4)\) which might be recoverable in other settings

Minimum Wage Application

For interactive fixed effects, need more periods groups

  • Add years back to 1998

  • Add 2002 group

  • Expanded data and code for this is on website

Minimum Wage Application

Start with Callaway and Sant’Anna (2021)

  • All groups relative to never-treated group

Minimum Wage Application

Minimum Wage Application

Event Study

Minimum Wage Application

library(ife)
set.seed(09192024)
ife1 <- staggered_ife2(
  yname = "lemp",
  gname = "cohort",
  tname = "year",
  idname = "id",
  data = data3,
  nife = 1,
  weighting_matrix = "2sls",
  cband = FALSE,
  boot_type = "empirical",
  biters = 100,
  cl = 10,
  anticipation = 0
)

Minimum Wage Application

Summary

This section has emphasized alternative approaches to DID to recover disaggregated treatment effect parameters:

  • Lagged outcome unconfoundedness

  • Change-in-changes

  • Interactive fixed effects models

We have targeted \(ATT(g,t)\). Moving to more aggregated treatment effect parameters such as \(ATT^{es}(e)\) or \(ATT^o\) is the same as before.

Summary

I want to emphasize the high-level thought process one last time for using/inventing heterogeneity robust causal inference procedures with panel data:

  • Step 1: target disaggregated parameters directly using whatever approach you think would work well for recovering the \(ATT\) for a fixed “group” and “time”

  • Step 2: if desired, combine those disaggregated parameters into lower dimensional parameter that you may be able to estimate better and report more easily; hopefully, you can provide some motivation for this aggregated parameter

Conclusion

Thank you very much for having me!


Contact Information: brantly.callaway@uga.edu

Code and Slides: Available here

Papers:

  • Callaway (2023), Handbook of Labor, Human Resources and Population Economics), [published version]   [draft version]; the draft version is ungated and very similar to the published version.

  • Today is also based on the not-yet-made-publicly available manuscript Baker, Callaway, Cunningham, Goodman-Bacon, and Sant’Anna (be on the lookout for it very soon)

Appendix

LOU Identification Explanation

Simplest possible non-trivial example: \(ATT(g=2,t=3)\).

Auxiliary condition: for any group \(g\), \(\E[Y_{t}(0) | Y_{t-1}(0), \ldots, Y_1(0), G=g] = \E[Y_{t}(0) | Y_{it-1}(0), G=g]\) (intuition: the right number of lags are included in the model). Then,

\[\begin{align*} \E[Y_3(0) | Y_1(0), G=2] &= \E\Big[ \E[Y_3(0) | Y_2(0), Y_1(0), G=2] \Big| Y_1(0), G=2 \Big] \\ &= \E\Big[ \E[Y_3(0) | Y_2(0), G=2] \Big| Y_1(0), G=2 \Big] \\ &= \E\Big[ \E[Y_3(0) | Y_2(0), U=1] \Big| Y_1(0), G=2 \Big] \\ &= \E\Big[ h(Y_2) \Big| Y_1(0), G=2 \Big] \\ &= \E\Big[ h(Y_2) \Big| Y_1(0), U=1 \Big] \\ &= \E\Big[ \E[Y_3(0) | Y_2(0), U=1] \Big| Y_1(0), U=1 \Big] \\ &= \E\Big[ \E[Y_3(0) | Y_2(0), Y_1(0), U=1] \Big| Y_1(0), U=1 \Big] \\ &= \E[Y_3(0) | Y_1(0), U=1] \end{align*}\]

LOU Identification Explanation (cont’d)

Thus, we have that \[\begin{align*} ATT(g=2,t=3) &= \E[Y_3|G=2] - \E[Y_3(0) | G=2] \\ &= \E[Y_3|G=2] - \E[Y_3(0) | G=2] \\ &= \E[Y_3|G=2] - \E\Big[ \E[Y_3(0) | Y_1(0), G=2] \Big| G=2\Big] \\ &= \E[Y_3|G=2] - \E\Big[ \E[Y_3(0) | Y_1(0), U=1] \Big| G=2\Big] \end{align*}\] done.

[Back]

References

Abadie, Alberto, Alexis Diamond, and Jens Hainmueller. 2010. “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program.” Journal of the American Statistical Association 105 (490): 493–505.
Arkhangelsky, Dmitry, Susan Athey, David A Hirshberg, Guido W Imbens, and Stefan Wager. 2021. “Synthetic Difference-in-Differences.” American Economic Review 111 (12): 4088–118.
Athey, Susan, and Guido Imbens. 2006. “Identification and Inference in Nonlinear Difference-in-Differences Models.” Econometrica 74 (2): 431–97.
Bai, Jushan. 2009. “Panel Data Models with Interactive Fixed Effects.” Econometrica 77 (4): 1229–79.
Bertrand, Marianne, Esther Duflo, and Sendhil Mullainathan. 2004. “How Much Should We Trust Differences-in-Differences Estimates?” The Quarterly Journal of Economics 119 (1): 249–75.
Brown, Nicholas, and Kyle Butts. 2023. “Dynamic Treatment Effect Estimation with Interactive Fixed Effects and Short Panels.”
Brown, Nicholas, Kyle Butts, and Joakim Westerlund. 2023. “Simple Difference-in-Differences Estimation in Fixed-t Panels.”
Callaway, Brantly. 2023. “Difference-in-Differences for Policy Evaluation.” In Handbook of Labor, Human Resources and Population Economics, edited by Klaus F. Zimmermann, 1–61. Springer International Publishing.
Callaway, Brantly, and Sonia Karami. 2023. “Treatment Effects in Interactive Fixed Effects Models with a Small Number of Time Periods.” Journal of Econometrics 233 (1): 184–208.
Callaway, Brantly, and Pedro HC Sant’Anna. 2021. “Difference-in-Differences with Multiple Time Periods.” Journal of Econometrics 225 (2): 200–230.
Callaway, Brantly, and Emmanuel Selorm Tsyawo. 2023. “Treatment Effects in Staggered Adoption Designs with Non-Parallel Trends.”
Carneiro, Pedro, Karsten Hansen, and James Heckman. 2003. “Estimating Distributions of Treatment Effects with an Application to the Returns to Schooling and Measurement of the Effects of Uncertainty on College Choice.” International Economic Review 44 (2): 361–422.
Ding, Peng, and Fan Li. 2019. “A Bracketing Relationship Between Difference-in-Differences and Lagged-Dependent-Variable Adjustment.” Political Analysis 27 (4): 605–15.
Gobillon, Laurent, and Thierry Magnac. 2016. “Regional Policy Evaluation: Interactive Fixed Effects and Synthetic Controls.” Review of Economics and Statistics 98 (3): 535–51.
Hsiao, Cheng, and Qiankun Zhou. 2019. “Panel Parametric, Semiparametric, and Nonparametric Construction of Counterfactuals.” Journal of Applied Econometrics 34 (4): 463–81.
Imbens, Guido, Nathan Kallus, and Xiaojie Mao. 2021. “Controlling for Unmeasured Confounding in Panel Data Using Minimal Bridge Functions: From Two-Way Fixed Effects to Factor Models.”
Powell, David, Beth Ann Griffin, and Tal Wolfson. 2023. “Estimating Policy Effects Using Lagged Outcome Values to Impute Counterfactuals.”
Sen, Amartya. 1997. On Economic Inequality. Clarendon Press.
Xu, Yiqing. 2017. “Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models.” Political Analysis 25 (1): 57–76.