University of Georgia
February 5, 2025
\(\newcommand{\E}{\mathbb{E}} \newcommand{\E}{\mathbb{E}} \newcommand{\var}{\mathrm{var}} \newcommand{\cov}{\mathrm{cov}} \newcommand{\Var}{\mathrm{var}} \newcommand{\Cov}{\mathrm{cov}} \newcommand{\Corr}{\mathrm{corr}} \newcommand{\corr}{\mathrm{corr}} \newcommand{\F}{\mathrm{F}} \newcommand{\L}{\mathrm{L}} \renewcommand{\P}{\mathrm{P}} \newcommand{\T}{\mathrm{T}} \newcommand{\independent}{{\perp\!\!\!\perp}} \newcommand{\indicator}[1]{ \mathbf{1}\{#1\} }\)
Two periods of panel data: \(t=1,2\)
No one treated at \(t=1\)
At period \(t=2\), some units become treated. \(D_i = 1\) if treated, \(0\) otherwise.
Potential outcomes in each time period: \(Y_{it}(1)\) and \(Y_{it}(0)\)
Observed outcomes: \(Y_{it=2} = D_i Y_{it=2}(1) + (1-D_i) Y_{it=2}(0)\) and \(Y_{it=1} = Y_{it=1}(0)\)
\[ ATT = \E[Y_{t=2}(1) - Y_{t=2}(0) | D=1] \]
Quantile treatment effect on the treated (QTT):
\[ QTT(\tau) = Q_{Y_{t=2}(1) | D=1}(\tau) - Q_{Y_{t=2}(0) | D=1}(\tau) \]
They assume that:
\[Y_{it}(0) = h_t(U_{it})\]
where \(h_t\) is a nonparametric, time-varying function. This model generalizes the typical model that is used to rationalize difference-in-differences:
\[Y_{it}(0) = \theta_t + \underbrace{\eta_i + e_{it}}_{U_{it}}\]
\(U_{t} \overset{d}{=} U_{t'} | G\). In words: the distribution of \(U_{t}\) does not change over time given a particular group. However, the distribution of \(U_{t}\) can vary across groups.
\(U_{t}\) is scalar
\(h_t\) is stictly monotonically increasing \(\implies\) we can invert it.
Support condition: \(\mathcal{U}_g \subseteq \mathcal{U}_0\) (support of \(U_{t}\) for the treated group is a subset of the support of \(U_{t}\) for the untreated group)
We will show that
\[\E[Y_{t=2}(0) | D=1] = \E\left[ Q_{Y_{t=2}|D=0}\Big(F_{Y_{t=1}|D=0}\big(Y_{t=1}\big)\Big) \Big| D=1 \right]\]
under the conditions mentioned above
Notice that
\[ \begin{aligned} \F_{Y_{t=1}(0) | D=1}(y) &= \P\big( Y_{t=1}(0) < y \big| D=1 \big) \\ &= \P\big( h_{t=1}(U_{t=1}) < y \big| D=1 \big) \\ &= \P\big( U_{t=1} < h^{-1}_{t=1}(y) \big| D=1 \big) \\ &= \F_{U|D=1}\big( h^{-1}_{t=1}(y)\big) \end{aligned} \]
Define, for some \(\tau \in [0,1]\), \(y_\tau = Q_{Y_{t=2}(0)|D=1}(\tau)\). Then it follows from previous result that
\[ \begin{aligned} \tau = \F_{Y_{t=2}(0) | D=1}(y_\tau) = \F_{U|D=1}\big(h^{-1}_{t=2}(y_\tau)\big) \end{aligned} \]
which further implies that
\[ \begin{aligned} y_\tau = h_{t=2}\big( \F_{U|D=1}^{-1}(\tau)\big) \end{aligned} \]
Setting \(\tau = F_{Y_{t=1}(0)|D=1}(y)\) and using preliminary result 2, we have that \[ \begin{aligned} Q_{Y_{t=2}(0)|D=1}\big( \F_{Y_{t=1}(0)|D=1}(y) \big) &= h_{t=2}\Big( \F_{U|D=1}^{-1}\big( \F_{Y_{t=1}(0)|D=1}(y) \big)\Big) \\ &= h_{t=2}\left\{ \F_{U|D=1}^{-1}\Big( \F_{U|D=1}\big( h^{-1}_{t=1}(y)\big) \Big) \right\} \\ &= h_{t=2}\big( h^{-1}_{t=1}(y)\big) \end{aligned} \]
where the second equality uses preliminary result 1
Notice that this term doesn’t depend on \(D=1\), and we can use symmetric arguments to show that
\[ Q_{Y_{t=2}(0)|D=1}\big( \F_{Y_{t=1}(0)|D=1}(y) \big) = h_{t=2}\big( h^{-1}_{t=1}(y)\big) = Q_{Y_{t=2}(0)|D=0}\big( \F_{Y_{t=1}(0)|D=0}(y) \big)\]
which holds for any \(y\)
Noticing that, conditional on \(D=1\), \(\F_{Y_{t=1}(0)|D=1}\big(Y_{t=1}(0)\big) \sim U[0,1]\)
\[ \begin{aligned} \E[Y_{t=2}(0) | D=1] &= \E\left[ Q_{Y_{t=2}(0)|D=1}\Big( \F_{Y_{t=1}(0)|D=1}\big(Y_{t=1}(0)\big) \Big) \middle| D=1 \right] \\ &= \E\left[ Q_{Y_{t=2}(0)|D=0}\Big( \F_{Y_{t=1}(0)|D=0}\big(Y_{t=1}(0)\big) \Big) \middle| D=1 \right] \end{aligned} \]
where the second equality holds by our preliminary results and completes the proof \(\implies ATT\) is identified.
(Unlike DiD), essentially the same arguments can be used to recover \(\P(Y_t(0) < y | D=1)\).
That expression can be used to recover quantile treatment effects
\[Q_{Y_{t=2}(0)|D=1}(\tau) = Q_{Y_{t=2}(0)|D=0}\Big( \F_{Y_{t=1}(0)|D=0}\big( Q_{Y_{t=1}(0)|D=1}(\tau) \big) \Big) \]
One way to view DiD is as a before-after comparison but where \(Y_{t-1}(0)\) is adjusted to account for time trends
\[\E[Y_{t=2}(0) | D=1] = \E[Y_{t=1}(0) + \textrm{time adjustment} | D=1]\]
where \(\textrm{time adjustment} = \E[\Delta Y_{t=2}(0) | D=0]\).
I think it is fair to view CiC similarly, but adjusting for time in a different way
\[ \E[Y_{t=2}(0) | D=1] = \E\left[ \underbrace{Q_{Y_{t=2}(0)|D=0}\Big( \F_{Y_{t=1}(0)|D=0}\big(}_{\textrm{time adjustment}}Y_{t=1}(0)\big) \Big) \middle| D=1 \right] \]
Assume:
\[\underbrace{Q_{Y_{t=2}(0)|D=1}(\tau)} - Q_{Y_{t=1}(0)|D=1}(\tau) = Q_{Y_{t=2}(0)|D=0}(\tau) - Q_{Y_{t=1}(0)|D=0}(\tau)\]
They show that this is rationalized under a different model for untreated potential outcomes, effectively (I think) correlated random effects:
\[ Y_{it}(0) = \theta_t + \underbrace{\gamma}_g + e_{it} \]
with \(e_{it} | G=g \sim F_e\)
Hold rank across time periods?? Not needed above, but interesting to think about (what if potential outcomes not fixed for each unit?)