class: center, middle, inverse, title-slide # Difference in Differences with a Continuous Treatment ### Brantly Callaway, University of Georgia
Andrew Goodman-Bacon, Federal Reserve Bank of Minneapolis
Pedro H.C. Sant’Anna, Microsoft & Vanderbilt University
### August 6, 2021
DID Reading Group --- # Motivation `$$\newcommand{\E}{\mathbb{E}}$$` <style type="text/css"> .alert { font-weight:bold; color: red; } .alert-blue { font-weight: bold; color: blue; } .remark-slide-content { font-size: 23px; padding: 1em 4em 1em 4em; } .highlight-red { background-color:red; padding:0.1em 0.2em; } </style> There has been a lot of recent work/interest in DID! A number of papers have <span class="alert">diagnosed</span> issues with very commonly used two-way fixed effects (TWFE) regressions to implement DID * de Chaisemartin and d'Haultfoueille (2020), Borusyak, Jaravel, and Spiess (2021) Goodman-Bacon (2021), Sun and Abraham (2020) -- Summary of Issues: * Already-treated groups sometimes serve as comparison group `\(\implies\)` treatment effect dynamics can lead to very poor estimates of treatment effects * Weights on underlying parameters are driven by estimation method --- # Motivation There have also been a number of papers <span class="alert">fixing</span> these issues * Callaway and Sant'Anna (2020), Cengiz, Dube, Lindner, and Zipperer (2019), Gardner (2021) * `\(+\)` previous papers -- Basic idea: * Explicitly make "good" comparisons and omit "bad" comparisons * Choose your own weights `\(\implies\)` can recover overall `\(ATT\)`, event studies, or other target parameters of interest --- # This paper These papers have (largely) focused on the case with a binary, staggered treatment * Some exceptions: de Chaisemartin and D'Haultfouille (2020, 2021) But there is considerable demand for understanding DID with more general treatments --- # Twitter <center><img src="tweet_better.png" width=90%></center> --- count:false # This paper <mark>Current paper:</mark> Generalize binary treatment case to multi-valued or continuous treatment (<span class="alert">"dose"</span>) -- `$$Y_{it} = \theta_t + \eta_i + \beta^{twfe} \cdot D_i \cdot Treat_{it} + v_{it}$$` Setup: * Treatment "continuous enough" that researcher would estimate above model rather than include a sequence of dummy variables * Researchers often interpret `\(\beta^{twfe}\)` as an <span class="alert">average causal response</span> * i.e., (an average over) casual effects of a marginal increase in the dose --- # This paper <span class="alert">Similar issues</span> as in binary treatment literature related to regression (TWFE) estimation strategies when the treatment is multi-valued and/or continuous * Already treated units serve as comparison group `\(\implies\)` poor estimates of treatment effect parameters in the presence of treatment effect dynamics * `\(TWFE\)` estimate is a weighted average of underlying treatment parameters, but weights driven by estimation method * (this one is new) Heterogeneous causal effects of dose across timing-groups can lead to poor estimates (negative weights) -- As in the case with a staggered, binary treatment, we can fix all of these by * Carefully making desirable comparisons * Choosing our own weights --- # Now for the bad news... However, there are <span class="alert">new issues</span> related to interpreting differences between treatment effects at different doses as <span class="alert">causal effects</span> Intuition: "Standard" DID delivers ATT-type parameters. * These are <span class="alert">local</span> to a specific dose `\(\implies\)` Comparisons across different doses include both: * The causal effect of more dose * "Selection bias" terms * Getting rid of these selection bias terms requires additional assumptions that are likely to be substantially stronger in practice No easy fixes here! -- `\(\implies\)` (at least in some sense), this is <mark>more negative than previous papers</mark> --- # A few comments... * Brand new paper * Not 100% complete * No application * No code * <span class="alert">Comments/suggestions/etc. more than welcome</span> --- # Outline <br> <br> <br> 1. Baseline Case: Two periods, no one treated in first period 2. TWFE in Baseline Case 3. More General Case: Multiple periods, variation in treatment timing 4. TWFE in More General Case --- class: inverse, middle, center # Baseline Case <br><br> Two periods, no one treated in first period --- # Notation Potential outcomes notation * Two time periods: `\(t\)` and `\(t-1\)` * No one treated until period `\(t\)` * Some units remain untreated in period `\(t\)` * Potential outcomes: `\(Y_{it}(d)\)` * Observed outcomes: `\(Y_{it}\)` and `\(Y_{it-1}\)` `$$Y_{it}=Y_{it}(D_i) \quad \textrm{and} \quad Y_{it-1}=Y_{it-1}(0)$$` --- # Parameters of Interest (ATT-type) * Level Effects (Average Treatment Effect on the Treated) `$$ATT(d|d) := \E[Y_t(d) - Y_{t}(0) | D=d]$$` * Interpretation: The average effect of dose `\(d\)` relative to not being treated *local to the group that actually experienced dose `\(d\)`* * This is the natural analogue of `\(ATT\)` in the binary treatment case -- * Slope Effect (Average Causal Responses) `$$ACRT(d|d) := \frac{\partial ATT(l|d)}{\partial l} \Big|_{l=d} \ \ \ \textrm{and} \ \ \ ACRT^O := \E[ACRT(D|D)|D>0]$$` * Interpretation: `\(ACRT(d|d)\)` is the causal effect of a marginal increase in dose *local to units that actually experienced dose `\(d\)`* * `\(ACR^O\)` averages `\(ACRT(d|d)\)` over the population distribution of the dose --- # Discrete Dose * Level Effects (Average Treatment Effect on the Treated) `$$ATT(d|d) := \E[Y_t(d) - Y_{t-1}(0) | D=d]$$` * This is exactly the same as for continuous dose -- * Slope Effect (Average Causal Responses) * Possible doses: `\(\{d_1, \ldots, d_J\}\)` `$$ACRT(d_j|d_j) := ATT(d_j|d_j) - ATT(d_{j-1}|D=d_j)$$` -- * Interestingly: In the case with a binary treatment, `\(ACRT(1|1) = ATT\)` `\(\implies\)` In binary treatment case, `\(ATT\)` is both a slope and level effect --- # Identification ## "Standard" Parallel Trends Assumption For all `\(d\)`, `$$\E[\Delta Y_t(0) | D=d] = \E[\Delta Y_t(0) | D=0]$$` -- Then, -- $$ `\begin{aligned} ATT(d|d) &= \E[Y_t(d) - Y_t(0) | D=d] \hspace{150pt} \end{aligned}` $$ --- count:false # Identification ## "Standard" Parallel Trends Assumption For all `\(d\)`, `$$\E[\Delta Y_t(0) | D=d] = \E[\Delta Y_t(0) | D=0]$$` Then, $$ `\begin{aligned} ATT(d|d) &= \E[Y_t(d) - Y_t(0) | D=d] \hspace{150pt}\\ &= \E[Y_t(d) - Y_{t-1}(0) | D=d] - \E[Y_t(0) - Y_{t-1}(0) | D=d] \end{aligned}` $$ --- count:false # Identification ## "Standard" Parallel Trends Assumption For all `\(d\)`, `$$\E[\Delta Y_t(0) | D=d] = \E[\Delta Y_t(0) | D=0]$$` Then, $$ `\begin{aligned} ATT(d|d) &= \E[Y_t(d) - Y_t(0) | D=d] \hspace{150pt}\\ &= \E[Y_t(d) - Y_{t-1}(0) | D=d] - \E[Y_t(0) - Y_{t-1}(0) | D=d]\\ &= \E[Y_t(d) - Y_{t-1}(0) | D=d] - \E[\Delta Y_t(0) | D=0] \end{aligned}` $$ --- count:false # Identification ## "Standard" Parallel Trends Assumption For all `\(d\)`, `$$\E[\Delta Y_t(0) | D=d] = \E[\Delta Y_t(0) | D=0]$$` Then, $$ `\begin{aligned} ATT(d|d) &= \E[Y_t(d) - Y_t(0) | D=d] \hspace{150pt}\\ &= \E[Y_t(d) - Y_{t-1}(0) | D=d] - \E[Y_t(0) - Y_{t-1}(0) | D=d]\\ &= \E[Y_t(d) - Y_{t-1}(0) | D=d] - \E[\Delta Y_t(0) | D=0]\\ &= \E[\Delta Y_t | D=d] - \E[\Delta Y_t | D=0] \end{aligned}` $$ <mark>This is exactly what you would expect</mark> --- # Are we done? <mark>Unfortunately, no</mark> -- Most applied work with a multi-valued or continuous treatment wants to think about how causal responses vary across dose * For example, plot treatment effects as a function of dose * Does more dose tends to increase/decrease/not effect outcomes? * Average causal response parameters *inherently* involve comparisons across slightly different doses --- # Interpretation Issues Consider comparing `\(ATT(d|d)\)` for two different doses -- $$ `\begin{aligned} & ATT(d_h|d_h) - ATT(d_l|d_l) \hspace{350pt} \end{aligned}` $$ --- count:false # Interpretation Issues Consider comparing `\(ATT(d|d)\)` for two different doses $$ `\begin{aligned} & ATT(d_h|d_h) - ATT(d_l|d_l) \hspace{350pt}\\ & \hspace{25pt} = \underbrace{\E[Y_t(d_h) - Y_t(d_l) | D=d_h]}_{\textrm{Causal Response}} + \underbrace{ATT(d_l|d_h) - ATT(d_l|d_l)}_{\textrm{Selection Bias}} \end{aligned}` $$ -- "Standard" Parallel Trends is not strong enough to rule out the selection bias terms here * Implication: If you want to interpret differences in treatment effects across different doses, then you will need stronger assumptions than standard parallel trends * This problem spills over into identifying `\(ACRT(d|d)\)` --- # Alternative Parameters of Interest (ATE-type) * Level Effects `$$ATE(d) := \E[Y_t(d) - Y_t(0)]$$` -- * Slope Effects $$ `\begin{aligned} ACR(d) := \frac{\partial ATE(d)}{\partial d} \ \ \ \ &\textrm{or} \ \ \ \ ACR(d_j) := ATE(d_j) - ATE(d_{j-1}) \\ & \textrm{or} \ \ \ ACR^O := \E[ACR(D) | D>0] \end{aligned}` $$ --- # Comparisons across dose ATE-type parameters do not suffer from the same issues as ATT-type parameters when making comparisons across dose -- $$ `\begin{aligned} ATE(d_h) - ATE(d_l) &= \E[Y_t(d_h) - Y_t(0)] - \E[Y_t(d_l) - Y_t(0)] \end{aligned}` $$ --- count:false # Comparisons across dose ATE-type parameters do not suffer from the same issues as ATT-type parameters when making comparisons across dose $$ `\begin{aligned} ATE(d_h) - ATE(d_l) &= \E[Y_t(d_h) - Y_t(0)] - \E[Y_t(d_l) - Y_t(0)]\\ &= \underbrace{\E[Y_t(d_h) - Y_t(d_l)]}_{\textrm{Causal Response}} \end{aligned}` $$ -- <mark>Unfortunately, "Standard" Parallel Trends Assumption not strong enough to identify `\(ATE(d)\)`.</mark> --- # Introduce Stronger Assumptions ## "Strong" Parallel Trends For all d, `$$\E[Y_t(d) - Y_{t-1}(0)] = \E[Y_t(d) - Y_{t-1}(0) | D=d]$$` -- Under Strong Parallel Trends, it is straightforward to show that `$$ATE(d) = \E[\Delta Y_t | D=d] - \E[\Delta Y_t|D=0]$$` RHS is exactly the same expression as for `\(ATT(d|d)\)` under "standard" parallel trends, but here * assumptions are different * parameter interpretation is different --- # Comments on Strong Parallel Trends * This is notably different from "Standard" Parallel Trends * It involves potential outcomes for all values of the dose (not just untreated potential outcomes) * Can show that it is not <span class="alert">strictly</span> stronger than Standard Parallel Trends * But it is likely to be substantially stronger in practice * It is also slightly weaker than assuming * `\(ATE(d) = ATT(d|d)\)` (this is a form of treatment effect homogeneity) * All dose groups would have experienced the same path of outcomes had they been assigned the same dose --- # Summarizing * It is straightforward/familiar to identify ATT-type parameters with a multi-valued or continuous dose * However, comparison of ATT-type parameters across different doses are hard to interpret * They include selection bias terms * This issues extends to identifying ACRT parameters * This suggests targeting ATE-type parameters * Comparisons across doses do not contain selection bias terms * But identifying ATE-type parameters requires stronger assumptions --- class: inverse, center, middle # TWFE in Baseline Case --- # TWFE The most common strategy in applied work is to estimate the two-way fixed effects (TWFE) regression: `$$Y_{it} = \theta_t + \eta_i + \beta^{twfe} \cdot D_i \cdot Post_t + v_{it}$$` In baseline case (two periods, no one treated in first period), this is just `$$\Delta Y_i = \beta_0 + \beta^{twfe} \cdot D_i + \Delta v_i$$` `\(\beta^{twfe}\)` often loosely interpreted as Average Causal Response --- # Interpreting `\(\beta^{twfe}\)` In the paper, we show that * Under Standard Parallel Trends: `$$\beta^{tfwe} = \int_{\mathcal{D}_+} w_1(l) \left[ ACRT(l|l) + \frac{\partial ATT(l|h)}{\partial h} \Big|_{h=l} \right] \, dl + w_0 \frac{ATT(d_L|d_L)}{d_L}$$` * `\(w_1(l)\)` and `\(w_0\)` are positive weights that integrate to 1 * `\(ACRT(l|l)\)` is average causal response conditional on `\(D=l\)` * `\(\frac{\partial ATT(l|h)}{\partial h} \Big|_{h=l}\)` is a local selection bias term * `\(\frac{ATT(d_L|d_L)}{d_L}\)` is the causal effect of going from no dose to the smallest possible dose (conditional on `\(D=d_L\)`) --- # Interpreting `\(\beta^{twfe}\)` * Under Strong Parallel Trends: `$$\beta^{tfwe} = \int_{\mathcal{D}_+} w_1(l) ACR(l) \, dl + w_0 \frac{ATE(d_L)}{d_L}$$` * `\(w_1(l)\)` and `\(w_0\)` are same weights as before * `\(ACR(l)\)` is average causal response to dose `\(l\)` across entire population * there is no selection bias term * `\(\frac{ATE(d_L)}{d_L}\)` is the causal effect of going from no dose to the smallest possible dose (across entire population) --- # What does this mean? * Issue \#1: Selection bias terms that show up under standard parallel trends `\(\implies\)` to interpret as a weighted average of any kind of causal responses, need to invoke (likely substantially) stronger assumptions -- * Issue \#2: Weights * They are all positive * But this is a <span class="alert">very minimal</span> requirement for weights being "reasonable" * These weights have the "strange" property that they are maximized at `\(d=\E[D]\)`. --- # Ex. Mixture of Normals Dose ![](data:image/png;base64,#did_reading_group_files/figure-html/unnamed-chunk-5-1.png)<!-- --> --- # Ex. Exponential Dose ![](data:image/png;base64,#did_reading_group_files/figure-html/unnamed-chunk-6-1.png)<!-- --> --- # What does this mean? * Issue \#3: Pre-testing * That the expressions for `\(ATE(d)\)` and `\(ATT(d|d)\)` are exactly the same also means that we cannot use pre-treatment periods to try to distinguish between "standard" and "strong" parallel trends --- # What should you do? 1. Either (i) report `\(ATT(d|d)\)` directly and interpret carefully, or (ii) be aware (and think through) that `\(\beta^{twfe}\)`, comparisons across `\(d\)`, or average causal response parameters all require imposing stronger assumptions -- 2. With regard to weights, there are likely better options for estimating causal effect parameters * Step 1: Nonparametrically estimate `\(ACR(d) = \frac{\partial \E[\Delta Y | D=d]}{\partial d}\)` * Side-comment: This is not actually too hard to estimate. No curse-of-dimensionality, etc. * Step 2: Estimate `\(ACR^0 = \E[ACR(D)|D>0]\)`. * <span class="alert">These do not get around the issue of requiring a stronger assumption</span> --- class: inverse, middle, center # More General Case <br> <br> Multiple periods, variation in treatment timing --- # Setup * Staggered treatment adoption * If you are treated today, you will continue to be treated tomorrow * Note relatively straightforward to relax, just makes notation more complex * Can allow for treatment anticipation too, but ignoring for simplicity now * Once become treated, dose remains constant (could probably relax this too) --- # Setup * Additional Notation: * `\(G_i\)` -- a unit's "group" (the time period when unit becomes treated) * Potential outcomes `\(Y_{it}(g,d)\)` -- the outcome unit `\(i\)` would experience in time period `\(t\)` if they became treated in period `\(g\)` with dose `\(d\)` * `\(Y_{it}(0)\)` is the potential outcome corresponding to not being treated in any period --- # Parameters of Interest Level Effects: $$ ATT(g,t,d|g,d) := \E[Y_t(g,d) - Y_t(0) | G=g, D=d] \ \ \ \textrm{and} \ \ \ ATE(g,t,d) := \E[Y_t(g,d) - Y_t(0) ]$$ -- Slope Effects: `$$ACRT(g,t,d|g,d) := \frac{\partial ATT(g,t,l|g,d)}{\partial l} \Big|_{l=d} \ \ \ \textrm{and} \ \ \ ACR(g,t,d) := \frac{\partial ATE(g,t,d)}{\partial d}$$` --- # Parameters of Interest These essentially inherit all the same issues as in the two period case -- * Under a multi-period version of "standard" parallel trends, comparisons of `\(ATT\)` across different values of dose are hard to interpret * They contain selection bias terms -- * Under a multi-period version of "strong" parallel trends, comparisons of `\(ATE\)` across different values of dose straightforward to interpret * But this involves a much stronger assumption -- Expressions in remainder of talk are under "strong" parallel trends * Under "standard" parallel trends, add selection bias terms everywhere --- # Parameters of Interest Often, these are high-dimensional and it may be desirable to "aggregate" them -- * Average by group (across post-treatment time periods) and then across groups `\(\rightarrow\)` `\(ACR^{overall}(d)\)` (overall average causal response for particular dose) -- * Average `\(ACR^{overall}(d)\)` across dose `\(\rightarrow\)` `\(ACR^O\)` (this is just one number) and is likely to be the parameter that one would be targeting in a TWFE regression -- * Event study: average across groups who have been exposed to treatment for `\(e\)` periods `\(\rightarrow\)` For fixed `\(d\)` `\(\rightarrow\)` Average across different values of `\(d\)` `\(\implies\)` typical looking ES plot --- class: inverse, middle, center # TWFE in More General Case --- # TWFE Regression Consider the same TWFE regression as before `$$Y_{it} = \theta_t + \eta_i + \beta^{twfe} \cdot D_i \cdot Treat_{it} + v_{it}$$` --- # Running Example <center><img src="mp_setup.jpg" width=75%></center> --- # How should `\(\beta^{twfe}\)` be interpreted? We show in the paper that `\(\beta^{twfe}\)` is a weighted average of the following terms: `$$\delta^{WITHIN}(g) = \frac{\textrm{cov}(\bar{Y}^{POST}(g) - \bar{Y}^{PRE(g)}(g), D | G=g)}{\textrm{var(D|G=g)}}$$` * Comes from <span class="alert">within-group variation in the amount of dose</span> * This term is essentially the same as in the baseline case and corresponds to a <span class="alert">reasonable</span> treatment effect parameter under strong parallel trends * Like baseline case, (after some manipulations) this term corresponds to a "derivative"/"ACR" * Does not show up in the binary treatment case because there is no variation in amount of treatment --- # How should `\(\beta^{twfe}\)` be interpreted? <center><img src="mp_1a.png" width=65%></center> --- # How should `\(\beta^{twfe}\)` be interpreted? <center><img src="mp_1b.png" width=65%></center> --- # `\(\beta^{twfe}\)` weighted average, term 2 of 4 For `\(k > g\)` (i.e., group `\(k\)` becomes treated after group `\(g\)`), `$$\delta^{MID,PRE}(g,k) = \frac{\E\left[\big(\bar{Y}^{MID(g,k)} - \bar{Y}^{PRE(g)}\big) | G=g\right] - \E\left[\big(\bar{Y}^{MID(g,k)} - \bar{Y}^{PRE(g)}\big) | G=k \right]}{\E[D|G=g]}$$` * Comes from <span class="alert">comparing path of outcomes for a group that becomes treated (group `\(g\)`) relative to a not-yet-treated group (group `\(k\)`)</span> * Corresponds to a <span class="alert">reasonable</span> treatment effect parameter under strong parallel trends * Denominator (after some derivations) ends up giving this a "derivative"/"ACR" interpretation * Similar terms show up in the case with a binary treatment --- # `\(\beta^{twfe}\)` weighted average, term 2 of 4 <center><img src="mp_2a.png" width=65%></center> --- # `\(\beta^{twfe}\)` weighted average, term 3 of 4 For `\(k > g\)` (i.e., group `\(k\)` becomes treated after group `\(g\)`), $$ `\begin{aligned} \delta^{POST,MID}(g,k) &= \frac{\E\left[\big(\bar{Y}^{POST(k)} - \bar{Y}^{MID(g,k)}\big) | G=k\right] - \E\left[\big(\bar{Y}^{POST(k)} - \bar{Y}^{MID(g,k)}\big) | D=0 \right]}{\E[D|G=k]} \\ &- \left(\frac{\E\left[\big(\bar{Y}^{POST(k)} - \bar{Y}^{PRE(k)}\big) | G=g\right] - \E\left[\big(\bar{Y}^{POST(k)} - \bar{Y}^{PRE(g)}\big) | D=0 \right]}{\E[D|G=k]} \right.\\ & \hspace{25pt} - \left.\frac{\E\left[\big(\bar{Y}^{MID(g,k)} - \bar{Y}^{PRE(k)}\big) | G=g\right] - \E\left[\big(\bar{Y}^{MID(g,k)} - \bar{Y}^{PRE(g)}\big) | D=0 \right]}{\E[D|G=k]} \right) \end{aligned}` $$ --- # `\(\beta^{twfe}\)` weighted average, term 3 of 4 For `\(k > g\)` (i.e., group `\(k\)` becomes treated after group `\(g\)`), $$ `\begin{aligned} \delta^{POST,MID}(g,k) &= \frac{\E\left[\big(\bar{Y}^{POST(k)} - \bar{Y}^{MID(g,k)}\big) | G=k\right] - \E\left[\big(\bar{Y}^{POST(k)} - \bar{Y}^{MID(g,k)}\big) | D=0 \right]}{\E[D|G=k]} \\ &- \textrm{Treatment Effect Dynamics for Group g} \end{aligned}` $$ * Comes from <span class="alert">comparing path of outcomes for a group that becomes treated (group `\(k\)`) to paths of outcomes of an already treated group (group `\(k\)`)</span> * In the presence of treatment effect dynamics (these are not ruled out by any parallel trends assumption), this term is <span class="alert">problematic</span> * This is similar-in-spirit to the problematic terms for TWFE with a binary treatment --- # `\(\beta^{twfe}\)` weighted average, term 3 of 4 <center><img src="mp_2b.png" width=65%></center> --- # `\(\beta^{twfe}\)` weighted average, term 4 of 4 For `\(k > g\)` (i.e., group `\(k\)` becomes treated after group `\(g\)`), $$ `\begin{aligned} \delta^{POST,PRE}(g,k) = \frac{\E\left[\big(\bar{Y}^{POST(k)} - \bar{Y}^{PRE(g)}\big) | G=g\right] - \E\left[\big(\bar{Y}^{POST(k)} - \bar{Y}^{PRE(g)}\big) | G=k \right]}{\E[D|G=g] - \E[D|G=k]} \end{aligned}` $$ * Comes from <span class="alert">comparing path of outcomes for groups `\(g\)` and `\(k\)` in their common post-treatment periods relative to their common pre-treatment periods</span> * In the presence of heterogeneous causal responses (causal response in same time period differs across groups), this term ends up being (partially) <span class="alert">problematic</span> too * Only shows up when `\(\E[D|G=g] \neq \E[D|G=k]\)` * No analogue in the binary treatment case --- # `\(\beta^{twfe}\)` weighted average, term 4 of 4 <center><img src="mp_panel3.jpg" width=65%></center> --- # Summary of TWFE Issues * Issue \#1: Selection bias terms that show up under standard parallel trends `\(\implies\)` to interpret as a weighted average of any kind of causal responses, need to invoke (likely substantially) stronger assumptions -- * Issue \#2: Weights * Negative weights possible due to (i) treatment effect dynamics or (ii) heterogeneous causal responses across groups * Are (undesirably) driven by estimation method -- Weights issues can be solved by carefully making desirable comparisons and user-chosen appropriate weights -- Selection bias terms are more fundamental challenge --- # Conclusion * There are a number of challenges to implementing/interpreting DID with a multi-valued or continuous treatment * Issues related to TWFE are (mostly) anticipated at this point * But (in my view) the main new issue here is that <span class="alert">justifying interpreting comparisons across different doses as causal effects requires stronger assumptions than most researchers probably think that they are making</span> * <mark>Link to paper:</mark> [https://arxiv.org/abs/2107.02637](https://arxiv.org/abs/2107.02637) * <mark>Other Summaries:</mark> (i) [Five minute summary](https://bcallaway11.github.io/posts/five-minute-did-continuous-treatment) (ii) [Pedro's Twitter](https://twitter.com/pedrohcgs/status/1415915759960690696) * <mark>Comments welcome:</mark> [brantly.callaway@uga.edu](mailto:brantly.callaway@uga.edu) * <mark>Code:</mark> ETA 2-3 months