Difference-in-Differences with a Continuous Treatment

Brantly Callaway

brantly.callaway@uga.edu

University of Georgia

Andrew Goodman-Bacon

andrew@goodman-bacon.com

Federal Reserve Bank of Minneapolis

Pedro Sant’Anna

pedro.santanna@emory.edu

Emory University

June 14, 2024

What’s Been Happening in the DID Literature?

\(\newcommand{\E}{\mathbb{E}} \newcommand{\E}{\mathbb{E}} \newcommand{\var}{\mathrm{var}} \newcommand{\cov}{\mathrm{cov}} \newcommand{\Var}{\mathrm{var}} \newcommand{\Cov}{\mathrm{cov}} \newcommand{\Corr}{\mathrm{corr}} \newcommand{\corr}{\mathrm{corr}} \newcommand{\L}{\mathrm{L}} \renewcommand{\P}{\mathrm{P}} \newcommand{\independent}{{\perp\!\!\!\perp}} \newcommand{\indicator}[1]{ \mathbf{1}\{#1\} }\)There have been a number of recent advances in the differences-in-differences literature. Two broad contributions:

Contribution 1: Diagnose issues with commonly used two-way fixed effects (TWFE) regressions commonly used to implement DID identification strategies \[Y_{i,t} = \theta_t + \eta_i + \beta^{twfe} D_{i,t} + e_{i,t}\]
- Roughly: TWFE regression can deliver poor estimates of causal effect parameters in the presence of treatment effect heterogeneity
Contribution 2: Propose alternative estimation strategies that “work” when the identification stratgey works (and are robust to treatment effect heterogeneity)

This Paper

These papers have (largely) focused on the case with a binary, staggered treatment

Current paper: Move from a setting with a binary treatment case to one with a continuous treatment (“dose”)

Some of the arguments involve extending ideas from the binary, staggered treatment case to a setting with continuous treatment

But we will also face new conceptual issues in this case that do not show up in a setting with a binary treatment

Example:

Effect of \(\underbrace{\textrm{length of school closures}}_{\textrm{continuous treatment}}\) (during Covid) on \(\underbrace{\textrm{students' test scores}}_{\textrm{outcome}}\)
- e.g., (Ager et al. 2024; Gillitzer and Prasad 2023, among others)

Today’s Talk

Identification: What’s the same as in the binary treatment case?
Identification: What’s different from the binary treatment case?
Interpreting TWFE Regressions (quickly if time permits)

1. Identification: What’s the same as in the binary treatment case?

Continuous Treatment Notation

Potential outcomes notation

Two time periods: \(t=1\) and \(t=2\)
- No one treated until period \(t=2\)
- Some units remain untreated in period \(t=2\)
Potential outcomes: \(Y_{i,t=2}(d)\)
Observed outcomes: \(Y_{i,t=2}\) and \(Y_{i,t=1}\)

\[Y_{i,t=2}=Y_{i,t=2}(D_i) \quad \textrm{and} \quad Y_{i,t=1}=Y_{i,t=1}(0)\]

Parameters of Interest (ATT-type)

Level Effects (Average Treatment Effect on the Treated)

\[ATT(d|d) := \E[Y_{i,t=2}(d) - Y_{i,t=2}(0) | D_i=d]\]

Interpretation: The average effect of dose \(d\) relative to not being treated local to the group that actually experienced dose \(d\)
This is the natural analogue of \(ATT\) in the binary treatment case

Parameters of Interest (ATT-type)

Slope Effects (Average Causal Response on the Treated)

\[ACRT(d|d) := \frac{\partial ATT(l|d)}{\partial l} \Big|_{l=d}\]

Interpretation: \(ACRT(d|d)\) is the causal effect of a marginal increase in dose local to units that actually experienced dose \(d\)

Aggregated Parameters

Notice that \(ATT(d|d)\) and \(ACRT(d|d)\) are functional parameters

This is different from \(\beta^{twfe}\) (from the TWFE regression of \(Y_{i,t}\) on \(D_{i,t}\))

We can view \(ATT(d|d)\) and \(ACRT(d|d)\) as the “building blocks” for a more aggregated parameter. Aggregated versions of these (into a single number) are \[\begin{align*} ATT^o := \E[ATT(D|D)|D>0] \qquad \qquad ACRT^o := \E[ACRT(D|D)|D>0] \end{align*}\]

\(ATT^o\) averages \(ATT(d|d)\) over the population distribution of the dose
\(ACRT^o\) averages \(ACRT(d|d)\) over the population distribution of the dose
\(ACRT^o\) is the natural target parameter for the TWFE regression in this case