Modern Approaches to Difference-in-Differences

Session 4: More Complicated Treatment Regimes

Brantly Callaway

University of Georgia

Introduction

\(\newcommand{\E}{\mathbb{E}} \newcommand{\E}{\mathbb{E}} \newcommand{\var}{\mathrm{var}} \newcommand{\cov}{\mathrm{cov}} \newcommand{\Var}{\mathrm{var}} \newcommand{\Cov}{\mathrm{cov}} \newcommand{\Corr}{\mathrm{corr}} \newcommand{\corr}{\mathrm{corr}} \newcommand{\L}{\mathrm{L}} \renewcommand{\P}{\mathrm{P}} \newcommand{\independent}{{\perp\!\!\!\perp}} \newcommand{\indicator}[1]{ \mathbf{1}\{#1\} }\) The discussion (and much of the recent DID literature) has focused on the setting with staggered treatment adoption.

However, this certainly does not cover the full range of possible treatments. In this session, we’ll primarily consider three leading extensions:

A treatment that is multi-valued or continuous (e.g., length of school closures during Covid on student test scores)
A treatment that can turn on and off (e.g., union status)
Treatment that can change amounts—we’ll try to take our minimum wage example more seriously

A couple of things to notice as we go along:

I’m not going to cover much on TWFE regressions here. They have even more sources of things that can go wrong.
Try to pay attention to the pattern. Even though the arguments are getting more complicated, we are still following the idea of (i) target disaggregated parameters, (ii) combine them into lower dimensional objects, (3) here there will be some additional interpretation issues that are worth emphasizing

Part 1: DID with a Continuous Treatment

Introduction

The arguments here will be for the case with a continuous treatment, but analogous results hold for other settings:

Multi-valued treatment
Differential exposure to a binary treatment

Running Example: Causal effect of the length of school closures on student test scores

Continuous Treatment Notation

Potential outcomes notation

Two time periods: \(t=2\) and \(t=1\)
- No one treated until period \(t=2\)
- Some units remain untreated in period \(t=2\)
Potential outcomes: \(Y_{it=2}(d)\)
Observed outcomes: \(Y_{it=2}\) and \(Y_{it=1}\)

\[Y_{it=2}=Y_{it=2}(D_i) \quad \textrm{and} \quad Y_{it=1}=Y_{it=1}(0)\]

Parameters of Interest (ATT-type)

Level Effects (Average Treatment Effect on the Treated)

\[ATT(d|d) := \E[Y_{t=2}(d) - Y_{t=2}(0) | D=d]\]

Interpretation: The average effect of dose \(d\) relative to not being treated local to the group that actually experienced dose \(d\)
This is the natural analogue of \(ATT\) in the binary treatment case

Parameters of Interest (ATT-type)

Slope Effects (Average Causal Response on the Treated)

\[ACRT(d|d) := \frac{\partial ATT(l|d)}{\partial l} \Big|_{l=d}\]

Interpretation: \(ACRT(d|d)\) is the causal effect of a marginal increase in dose local to units that actually experienced dose \(d\)

Aggregated Parameters

Notice that \(ATT(d|d)\) and \(ACRT(d|d)\) are functional parameters

This is different from \(\alpha\) (from the TWFE regression of \(Y_{it}\) on \(D_{it}\))

We can view \(ATT(d|d)\) and \(ACRT(d|d)\) as the “building blocks” for a more aggregated parameter. Aggregated versions of these (into a single number) are \[\begin{align*} ATT^o := \E[ATT(D|D)|D>0] \qquad \qquad ACRT^o := \E[ACRT(D|D)|D>0] \end{align*}\]

\(ATT^o\) averages \(ATT(d|d)\) over the population distribution of the dose
\(ACRT^o\) averages \(ACRT(d|d)\) over the population distribution of the dose
\(ACRT^o\) is the natural target parameter for the TWFE regression in this case