class: center, middle, inverse, title-slide # Modern Approaches to Difference in Differences ### Brantly Callaway, University of Georgia ### October 22, 2021
Session 2: Two Way Fixed Effects --- # A More Complicated Setup `$$\newcommand{\E}{\mathbb{E}}$$` `$$\newcommand{\P}{\mathrm{P}}$$` <style type="text/css"> border-top: 80px solid #BA0C2F; .inverse { background-color: #BA0C2F; } .alert { font-weight:bold; color: red; } .alert-blue { font-weight: bold; color: blue; } .remark-slide-content { font-size: 23px; padding: 1em 4em 1em 4em; } .highlight-red { background-color:red; padding:0.1em 0.2em; } </style> --- # A More Complicated Setup - `\(\mathcal{T}\)` time periods -- - Units can become treated at different points in time -- - For simplicity, we'll adapt the <span class="alert-blue">staggered treatment framework</span>. That is, once a unit becomes treated they remain treated. - `\(G_i\)` - a unit's <span class="alert-blue">group</span> - the time period that unit becomes treated. Set `\(G_i = \mathcal{T}+1\)` for units that do not participate in the treatment in any period. -- - Potential outcomes: `\(Y_{it}(g)\)` - the outcome that unit `\(i\)` would experience in time period `\(t\)` if they became treated in period `\(g\)`. -- - Untreated potential outcome: `\(Y_{it}(0)\)` - the outcome unit `\(i\)` would experience in time period `\(t\)` if they did not participate in the treatment in any period. -- - Observed outcome: `\(Y_{it}=Y_{it}(G_i)\)` -- - No anticipation condition: `\(Y_{it}(G_i) = Y_{it}(0)\)` for all `\(t < G_i\)` (pre-treatment periods for unit `\(i\)`) --- # A More Complicated Setup - A number of extensions (more complicated treatment regimes, anticipation effects, conditioning on covariates) are possible -- ## Multiple period version of parallel trends For all groups `\(g,k\)` and all `\(t=2,\ldots,\mathcal{T}\)`, `$$\E[\Delta Y_t(0) | G=g] = \E[\Delta Y_t(0) | G=k]$$` -- In words: trends in untreated potential outcomes are the same across all groups --- # What does TWFE estimate in this setup? `$$Y_{it} = \theta_t + \eta_i + \alpha D_{it} + v_{it}$$` -- <span class="alert-blue">Rough intuition:</span> `\(\alpha\)` "comes from" comparisons between the path of outcomes for units whose <span class="alert">treatment status changes</span> relative to the path of outcomes for units whose <span class="alert">treatment status stays the same</span> over time. -- We'll see that this intuition is pretty much right -- But some of these "comparisons" have undesirable properties --- # Goodman-Bacon (2021) <span class="alert-blue">Notation:</span> For two groups `\(g\)` and `\(k\)` with `\(k > g\)` (i.e., group `\(k\)` treated after group `\(g\)`), define: -- - `\(\bar{Y}_i^{PRE(g)}\)` - average outcome for individual `\(i\)` across periods before either group treated -- - `\(\bar{Y}_i^{MID(g,k)}\)` - average outcome for individual `\(i\)` across periods after group `\(g\)` becomes treated but before group `\(k\)` becomes treated -- - `\(\bar{Y}_i^{POST(k)}\)` - average outcome for individual `\(i\)` across periods after both groups are treated -- Further define: - `\(\bar{G}_g = \frac{\mathcal{T}-(g-1)}{\mathcal{T}}\)` - the fraction of periods that units in group `\(g\)` are treated (this is bigger for earlier treated groups) --- # Goodman-Bacon (2021) <span class="alert-blue">Bacon Decomposition: </span> `\(\alpha\)` from the TWFE regression can be written as -- `$$\sum_{g \in \mathcal{G}} \sum_{k \in \mathcal{G}\\k>g} w_1(g,k) \delta^{MID,PRE}(g,k) + w_2(g,k) \delta^{POST,MID}(g,k)$$` where `\(w_1(g,k)\)` and `\(w_2(g,k)\)` are positive weights satisfying -- `$$\sum_{g \in \mathcal{G}} \sum_{k \in \mathcal{G}\\k>g} w_1(g,k) + w_2(g,k) = 1$$` -- [we'll come back to these momentarily] --- # Goodman-Bacon (2021) First main term in Bacon decomposition: -- `$$\delta^{MID,PRE}(g,k) = \E\left[ \bar{Y}^{MID(g,k)} - \bar{Y}^{PRE(g)} | G=g \right] - \E\left[ \bar{Y}^{MID(g,k)} - \bar{Y}^{PRE(g)} | G=k \right]$$` -- - The first term is the "path" of outcomes experienced by group `\(g\)` (pre-treatment relative to post-treatment) -- - The second term, under the multiple period parallel trends assumption, is the path of outcomes that group `\(g\)` *would have experienced* if they had not become treated. -- <span class="alert">Under parallel trends, these are exactly the sort of comparisons that we would like to show up in `\(\alpha\)`.</span> --- # Goodman-Bacon (2021) Second main component of Bacon decomposition: -- `$$\delta^{POST,MID}(g,k) = \E\left[ \bar{Y}^{POST(k)} - \bar{Y}^{MID(g,k)} | G=k\right] - \E\left[ \bar{Y}^{POST(k)} - \bar{Y}^{MID(g,k)} | G=g \right]$$` -- - The first term is the path of outcomes experienced by group `\(k\)` (pre-treatment relative to post-treatment) -- - The second term is the path of outcomes experienced by group `\(g\)`. - These are periods where group `\(g\)`'s treatment status does not change - But these are post-treatment time periods for group `\(g\)` -- <span class="alert">However, parallel trends is not about paths of post-treatment outcomes...</span> --- # Goodman-Bacon (2021) By adding and subtracting terms to `\(\delta^{MID,POST}(g,k)\)`: -- $$ `\begin{aligned} \delta^{POST,MID}(g,k) &= \E\left[ \bar{Y}^{POST(k)} - \bar{Y}^{MID(g,k)} | G=k\right] - \E\left[ \bar{Y}^{POST(k)} - \bar{Y}^{MID(g,k)} | G=\mathcal{T}+1 \right] \\ & - \left\{\left(\E\left[ \bar{Y}^{POST(k)} - \bar{Y}^{PRE(g)} | G=g\right] - \E\left[ \bar{Y}^{POST(k)} - \bar{Y}^{PRE(g)} | G=\mathcal{T}+1 \right]\right) \right.\\ & \hspace{10pt} - \left.\left(\E\left[ \bar{Y}^{MID(g,k)} - \bar{Y}^{PRE(g)} | G=g\right] - \E\left[ \bar{Y}^{MID(g,k)} - \bar{Y}^{PRE(g)} | G=\mathcal{T}+1 \right]\right)\right\} \end{aligned}` $$ - The first term is "good"; under parallel trends, it is related to the effect of participating in the treatment for group `\(k\)` -- - The second term (everything inside `\(\{ \circ \}\)`), under parallel trends, is about <span class="alert-blue">treatment effect dynamics</span> -- It is undesirable that treatment effect dynamics show up in `\(\alpha\)` --- # Goodman-Bacon (2021) All this suggests the following about `\(\alpha\)` from the TWFE regression under parallel trends assumptions: -- - It is equal to a weighted average of (i) reasonable underlying treatment effect parameters, and (ii) treatment effect dynamics -- - de Chaisemartin and D'Haultfoeuille (2020) "negative weights" result is due to the treatment effect dynamics term discussed here - This opens up the possibility of really bad causal effect estimates due to TWFE. An extreme case would be that the effect of participating in the treatment is positive for all groups and time periods, but that negative weights (treatment effect dynamics) lead to a negative TWFE estimate of the effect of the treatment -- - You can introduce an extra (and testable) assumption ruling out treatment effect dynamics, but it seems more straightforward to just use a different estimation strategy --- # Goodman-Bacon (2021) Back to the weights: -- `$$w_1(g,k) = \frac{(1-\bar{G}_g)(\bar{G}_g - \bar{G}_k)(p_g + p_k)^2 p_{g|\{g,k\}}(1-p_{g|\{g,k\}})}{\textrm{normalizing constant}}$$` -- `$$w_2(g,k) = \frac{\bar{G}_k(\bar{G}_g - \bar{G}_k)(p_g + p_k)^2 p_{g|\{g,k\}}(1-p_{g|\{g,k\}})}{\textrm{normalizing constant}}$$` -- Both of these put more weight on: 1. larger groups, when `\(p_g\)` and/or `\(p_k\)` are large 2. similarly sized groups, `\(p_{g|\{g,k\}}(1-p_{g|\{g,k\}})\)` largest when `\(p_{g|\{g,k\}}=0.5\)`. 3. "middle" groups (middle between comparison group and beginning (for `\(w_1\)`) or end (for `\(w_2\)` time periods)) --- # Callaway and Sant'Anna (2021) Can we get around these issues with TWFE? -- <span class="alert-blue">Group-Time Average Treatment Effects</span> `$$ATT(g,t) = \E[Y_t(g) - Y_t(0) | G=g]$$` -- This is analogous to the `\(ATT\)` in the baseline case with two periods and two groups -- <span class="alert">Identification:</span> -- $$ `\begin{aligned} ATT(g,t) &= \E[Y_t(g) | G=g] - \E[Y_t(0) | G=g] \hspace{150pt} \end{aligned}` $$ --- count:false # Callaway and Sant'Anna (2021) Can we get around these issues with TWFE? <span class="alert-blue">Group-Time Average Treatment Effects</span> `$$ATT(g,t) = \E[Y_t(g) - Y_t(0) | G=g]$$` This is analogous to the `\(ATT\)` in the baseline case with two periods and two groups <span class="alert">Identification:</span> $$ `\begin{aligned} ATT(g,t) &= \E[Y_t(g) | G=g] - \E[Y_t(0) | G=g] \hspace{150pt}\\ &= \E[Y_t(g) - Y_{g-1}(0) | G=g] - \E[Y_t(0) - Y_{g-1}(0) | G=g] \end{aligned}` $$ --- count:false # Callaway and Sant'Anna (2021) Can we get around these issues with TWFE? <span class="alert-blue">Group-Time Average Treatment Effects</span> `$$ATT(g,t) = \E[Y_t(g) - Y_t(0) | G=g]$$` This is analogous to the `\(ATT\)` in the baseline case with two periods and two groups <span class="alert">Identification:</span> $$ `\begin{aligned} ATT(g,t) &= \E[Y_t(g) | G=g] - \E[Y_t(0) | G=g] \hspace{150pt}\\ &= \E[Y_t(g) - Y_{g-1}(0) | G=g] - \E[Y_t(0) - Y_{g-1}(0) | G=g]\\ &= \E[Y_t(g) - Y_{g-1}(0) | G=g] - \E[Y_t(0) - Y_{g-1}(0) | D_t=0] \end{aligned}` $$ --- count:false # Callaway and Sant'Anna (2021) Can we get around these issues with TWFE? <span class="alert-blue">Group-Time Average Treatment Effects</span> `$$ATT(g,t) = \E[Y_t(g) - Y_t(0) | G=g]$$` This is analogous to the `\(ATT\)` in the baseline case with two periods and two groups <span class="alert">Identification:</span> $$ `\begin{aligned} ATT(g,t) &= \E[Y_t(g) | G=g] - \E[Y_t(0) | G=g] \hspace{150pt}\\ &= \E[Y_t(g) - Y_{g-1}(0) | G=g] - \E[Y_t(0) - Y_{g-1}(0) | G=g]\\ &= \E[Y_t(g) - Y_{g-1}(0) | G=g] - \E[Y_t(0) - Y_{g-1}(0) | D_t=0]\\ &= \E[Y_t - Y_{g-1} | G=g] - \E[Y_t - Y_{g-1} | D_t=0] \end{aligned}` $$ --- # Callaway and Sant'Anna (2021) <span class="alert-blue">Estimation</span> $$ `\begin{aligned} \widehat{ATT}(g,t) &= \hat{\E}[Y_t - Y_{g-1} | G=g] - \hat{\E}[Y_t - Y_{g-1} | D_t=0] \hspace{150pt} \end{aligned}` $$ --- count:false # Callaway and Sant'Anna (2021) <span class="alert-blue">Estimation</span> $$ `\begin{aligned} \widehat{ATT}(g,t) &= \hat{\E}[Y_t - Y_{g-1} | G=g] - \hat{\E}[Y_t - Y_{g-1} | D_t=0] \hspace{150pt}\\ &= \frac{1}{n} \sum_{i=1}^n \frac{\mathbf{1}\{G_i = g\}}{\hat{\P}(G=g)}(Y_{it} - Y_{ig-1}) - \frac{1}{n} \sum_{i=1}^n \frac{\mathbf{1}\{G_i > t\}}{\hat{\P}(G > t)}(Y_{it} - Y_{ig-1}) \end{aligned}` $$ -- <span class="alert-blue">This is easy</span> and avoids making any of the "bad comparisons" that were causing problems for TWFE --- # Callaway and Sant'Anna (2021) One thing that is still different between TWFE and `\(ATT(g,t)\)` is that there are potentially "lots" of `\(ATT(g,t)\)`. <span class="alert-blue">Can we recover and "overall" ATT from these?</span> -- As a step in this direction, define: -- `$$ATT^G(g) := \frac{1}{\mathcal{T}-g+1} \sum_{t=g}^{\mathcal{T}} ATT(g,t)$$` -- This is the ATT (across all post-treatment time periods) for units in group `\(g\)`. -- Next, define: `$$ATT^O := \sum_{g \in \mathcal{G}} ATT^G(g) \P(G=g|G \in \mathcal{G})$$` -- where `\(\mathcal{G}\)` is the set of all groups that ever participate in the treatment. -- `\(ATT^O\)` is the average effect of participating in the treatment across all units that are treated in any time period `\(\implies\)` it's a natural overall treatment effect parameter. --- # Coding Examples Two examples: - Minimum Wage Policy - This is from Callaway and Sant'Anna (2021) - Some places "modern" DID will make a big difference, but others not much (think this is pretty typical) -- - Simulated Data - In cases where differences are small, we'll make sure to see that things can potentially go quite poorly -- For today, I'll just show code/results, but you should be able to download from my website and run locally if you would like to --- # Example: Minimum Wage - Use period in the U.S. from 2002-2007 where federal minimum wage was flat -- - Exploit minimum wage changes across states - Any state that increases their minimum wage above the federal minimum wage will be considered as treated -- - Interested in the effect of the minimum wage on teen employment --- # Example: Minimum Wage ```r library(did) # for bacondecomp, dev version is much faster # devtools::install_github("evanjflack/bacondecomp") library(bacondecomp) library(fixest) library(modelsummary) library(ggplot2) load("mw_data2.RData") ``` --- # Example: Minimum Wage ```r head(mw_data2) ``` ``` ## year countyreal lpop lemp first.treat treat region post ## 829 2001 8001 5.896761 8.730690 2006 1 4 0 ## 820 2002 8001 5.896761 8.541300 2006 1 4 0 ## 844 2003 8001 5.896761 8.461469 2006 1 4 0 ## 858 2004 8001 5.896761 8.336870 2006 1 4 0 ## 833 2005 8001 5.896761 8.340217 2006 1 4 0 ## 823 2006 8001 5.896761 8.378161 2006 1 4 1 ``` --- # Example: Minimum Wage ```r # add post-treatment dummy variable mw_data2$post <- 1*((mw_data2$year >= mw_data2$first.treat) & mw_data2$treat != 0) twfe_res <- feols(lemp ~ post | countyreal + year, data=mw_data2, cluster="countyreal") ``` --- # Example: Minimum Wage ```r modelsummary(twfe_res, gof_omit=".*") ``` <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:center;"> Model 1 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> post </td> <td style="text-align:center;"> −0.021 </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> (0.006) </td> </tr> </tbody> </table> --- # Example: Minimum Wage ```r # run bacon decomposition bacon_res <- bacon(lemp ~ post, data=mw_data2, id_var="countyreal", time_var="year") # confirm same estimate sum(bacon_res$estimate * bacon_res$weight) ``` ``` ## [1] -0.02129215 ``` ```r # bacon decomp results head(bacon_res) ``` ``` ## treated untreated estimate weight type ## 2 2005 2006 0.031370307 0.035177102 Earlier vs Later Treated ## 4 2003 2006 -0.025199486 0.023661728 Earlier vs Later Treated ## 5 2006 2005 -0.005552584 0.017588551 Later vs Earlier Treated ## 8 2003 2005 -0.030110977 0.006023476 Earlier vs Later Treated ## 9 2006 Inf -0.041531139 0.543036659 Treated vs Untreated ## 10 2005 Inf 0.014012249 0.248829811 Treated vs Untreated ``` --- # Example: Minimum Wage ```r # plot bacon decomposition ggplot(data=bacon_res, mapping=aes(x=weight, y=estimate, color=as.factor(type))) + geom_point(size=5) + scale_color_discrete(name="") + theme_bw() + theme(legend.position="bottom") ``` --- # Example: Minimum Wage <img src="data:image/png;base64,#modern_did_session2_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> --- # Example: Minimum Wage ```r # callaway and sant'anna cs_res <- att_gt(yname="lemp", tname="year", idname="countyreal", gname="first.treat", data=mw_data2) ``` --- # Example: Minimum Wage ```r ggdid(cs_res) ``` <img src="data:image/png;base64,#modern_did_session2_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> --- # Example: Minimum Wage ```r aggte(cs_res, type="group") ``` ``` ## ## Call: ## aggte(MP = cs_res, type = "group") ## ## Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Forthcoming at the Journal of Econometrics <https://arxiv.org/abs/1803.09015>, 2020. ## ## ## Overall summary of ATT’s based on group/cohort aggregation: ## ATT Std. Error [ 95% Conf. Int.] ## -0.0434 0.0059 -0.055 -0.0319 * ## ## ## Group Effects: ## Group Estimate Std. Error [95% Simult. Conf. Band] ## 2003 -0.0542 0.0131 -0.0841 -0.0243 * ## 2005 -0.0138 0.0082 -0.0325 0.0048 ## 2006 -0.0529 0.0076 -0.0703 -0.0355 * ## --- ## Signif. codes: `*' confidence band does not cover 0 ## ## Control Group: Never Treated, Anticipation Periods: 0 ## Estimation Method: Doubly Robust ``` --- # Example: Minimum Wage CS estimates roughly twice as large in magnitude as TWFE estimates, but qualitative results are similar (negative effects of minimum wage on teen employment). --- # Example: Simulated Data ```r library(tidyr) library(dplyr) # simulation parameters time.periods <- 20 groups <- c(5,15,time.periods+1) pg <- c(0.5,0.5,0) n <- 1000 # generate data (code omitted...) # load file: sim_data.RDS ``` --- # Example: Simulated Data ```r # plot data plotdf <- data %>% group_by(G, time.period) %>% summarise(Yobs=mean(Y), Y0=mean(Y0)) plotdf_obs <- plotdf %>% select(-Y0) plotdf_obs$group <- paste0(plotdf$G,"-observed") plotdf0 <- plotdf %>% select(-Yobs) plotdf0$group <- paste0(plotdf0$G,"-untreated") ggplot(data=plotdf, mapping=aes(x=time.period, y=Yobs, color=as.factor(G))) + geom_point() + geom_line() + geom_point(aes(y=Y0)) + geom_line(aes(y=Y0), linetype="dashed") + scale_x_continuous(breaks=seq(2,time.periods,by=2)) + ylab("Y") + theme_bw() ``` --- # Example: Simulated Data <img src="data:image/png;base64,#modern_did_session2_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" /> --- # Example: Simulated Data ```r # TWFE twfe_res <- feols(Y ~ post | id + time.period, data=data, cluster="id") modelsummary(twfe_res, gof_omit=".*") ``` <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:center;"> Model 1 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> post </td> <td style="text-align:center;"> −25.043 </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> (0.029) </td> </tr> </tbody> </table> --- # Example: Simulated Data ```r cs_res <- att_gt(yname="Y", tname="time.period", idname="id", gname="G", data=data, control_group = "notyettreated") round(aggte(cs_res, type="group")$overall.att, 3) ``` ``` ## Warning in compute.aggte(MP = MP, type = type, balance_e = balance_e, min_e ## = min_e, : Simultaneous conf. band is somehow smaller than pointwise one ## using normal approximation. Since this is unusual, we are reporting pointwise ## confidence intervals ``` ``` ## [1] 50.005 ``` --- # Example: Simulated Data These are much different results (CS is correct, TWFE is wildly incorrect). They are due to: 1. Heavy dynamics for early-treated group 2. No never-treated group (tends to really make these issues we're talking about much worse!) -- Next up: Pre-testing and Event Studies