Difference-in-Differences with a Continuous Treatment

Brantly Callaway

brantly.callaway@uga.edu

University of Georgia

Andrew Goodman-Bacon

andrew@goodman-bacon.com

Federal Reserve Bank of Minneapolis

Pedro Sant’Anna

pedro.santanna@emory.edu

Emory University

April 12, 2024

What’s Been Happening in the DID Literature?

\(\newcommand{\E}{\mathbb{E}} \newcommand{\E}{\mathbb{E}} \newcommand{\var}{\mathrm{var}} \newcommand{\cov}{\mathrm{cov}} \newcommand{\Var}{\mathrm{var}} \newcommand{\Cov}{\mathrm{cov}} \newcommand{\Corr}{\mathrm{corr}} \newcommand{\corr}{\mathrm{corr}} \newcommand{\L}{\mathrm{L}} \renewcommand{\P}{\mathrm{P}} \newcommand{\independent}{{\perp\!\!\!\perp}} \newcommand{\indicator}[1]{ \mathbf{1}\{#1\} }\)Economists have long used two-way fixed effects (TWFE) regressions to implement DID identification strategies: \[\begin{align*} Y_{i,t} = \theta_t + \eta_i + \beta^{twfe} D_{i,t} + e_{i,t} \end{align*}\]

A number of papers have diagnosed issues TWFE regressions in this context (de Chaisemartin and D’Haultfœuille 2020; Goodman-Bacon 2021; Sun and Abraham 2021; Borusyak, Jaravel, and Spiess 2022, among others)

Summary of Issues:

Issues arise due to treatment effect heterogeneity
\(\beta^{twfe}\) from the TWFE regression turns out to be equal to a weighted average of underlying treatment effect parameters (group-time average treatment effects)
Weights on underlying parameters are (non-transparently) driven by estimation method and can have undesirable properties

What’s been happening in the DID Literature?

There have also been a number of papers fixing these issues (previous papers plus Callaway and Sant’Anna 2021; Gardner 2022; Wooldridge 2021; Dube et al. 2023)

Intuition:

In a first step, directly target group-time average treatment effects
Choose weights on group-time average treatment effects to target parameters of interest (overall \(ATT\), event study, or others)

This Paper

These papers have (largely) focused on the case with a binary, staggered treatment

Current paper: Move from a setting with a binary treatment case to one with a continuous treatment (“dose”)

Some of the arguments involve extending ideas from the binary, staggered treatment case to a setting with continuous treatment

But we will also face new conceptual issues in this case that do not show up in a setting with a binary treatment

Example:

Effect of \(\underbrace{\textrm{length of school closures}}_{\textrm{continuous treatment}}\) (during Covid) on \(\underbrace{\textrm{students' test scores}}_{\textrm{outcome}}\)
- e.g., (Ager et al. 2024; Gillitzer and Prasad 2023, among others)

This paper

For today, mostly emphasize a continuous treatment, but our results also apply to other settings (with trivial modifications):

multi-valued treatments (e.g., effect of state-level minimum wage policies on employment)
binary treatment with differential “exposure” to the treatment (application in the paper: a binary Medicare policy where different hospitals had more exposure to the treatment)

But results do not apply to “fuzzy” DID setups

Fuzzy DID refers to a setting where a researcher is ultimately interested in understanding the effect of a binary treatment but observes aggregate data (e.g., interested in learning about union wage-premium (at individual level) using state-level data and exploiting variation in the “amount” of unionization across different locations)

Today’s Talk

Identification: What’s the same as in the binary treatment case?
Identification: What’s different from the binary treatment case?
Interpreting TWFE Regressions
Empirical Application

1. Identification: What’s the same as in the binary treatment case?

Continuous Treatment Notation

Potential outcomes notation

Two time periods: \(t=2\) and \(t=1\)
- No one treated until period \(t=2\)
- Some units remain untreated in period \(t=2\)
Potential outcomes: \(Y_{i,t=2}(d)\)
Observed outcomes: \(Y_{i,t=2}\) and \(Y_{i,t=1}\)

\[Y_{i,t=2}=Y_{i,t=2}(D_i) \quad \textrm{and} \quad Y_{i,t=1}=Y_{i,t=1}(0)\]

Parameters of Interest (ATT-type)

Level Effects (Average Treatment Effect on the Treated)

\[ATT(d|d) := \E[Y_{t=2}(d) - Y_{t=2}(0) | D=d]\]

Interpretation: The average effect of dose \(d\) relative to not being treated local to the group that actually experienced dose \(d\)
This is the natural analogue of \(ATT\) in the binary treatment case

Parameters of Interest (ATT-type)

Slope Effects (Average Causal Response on the Treated)

\[ACRT(d|d) := \frac{\partial ATT(l|d)}{\partial l} \Big|_{l=d}\]

Interpretation: \(ACRT(d|d)\) is the causal effect of a marginal increase in dose local to units that actually experienced dose \(d\)

We can view \(ACRT(d|d)\) as the “building block” here. An aggregated version of it (into a single number) is \[\begin{align*} ACRT^o := \E[ACRT(D|D)|D>0] \end{align*}\]

\(ACRT^o\) averages \(ACRT(d|d)\) over the population distribution of the dose
Like \(ATT^o\) for staggered treatment adoption, \(ACRT^o\) is the natural target parameter for the TWFE regression in this case