Jekyll2021-07-21T02:38:42+00:00https://bcallaway11.github.io/feed.xmlBrantly CallawaywebsiteBrantly CallawayIs our code slow…?2021-06-11T00:00:00+00:002021-06-11T00:00:00+00:00https://bcallaway11.github.io/posts/cs-slow-codeI just ran into the [`did_imputation` Stata command](https://github.com/borusyak/did_imputation) which, mainly, contains the code for implementing the ideas in Borusyak, Jaravel, and Spiess (2021). Interestingly, the new package provides calls to recent alternatives to two-way fixed effects in de Chaisemartin and D'Haultfoeuille (2020), Sun and Abraham (2020), and Callaway and Sant'Anna (2020) --- so you can see estimates all in the same plot:
<img src="/assets/images/cs_slow.jpeg">
This is great, as users can display several of these new estimators along the same plot. What catches *my* eye here though is how slow our code appears to be: taking over two minutes to run compared to about 1 second for other approaches. This is not even a very complicated simulation either; there are 300 units and 15 time periods. If our code doesn't run fast in this case, it is a bad sign!
The other thing that I immediately notice is that `did_imputation` is written in Stata, and the main version of our code is written in R. Our Stata version is, at the moment, a brand new proof-of-concept and still in beta mode. Let's see what happens if we try the same simulations but in R using the `did` package instead of Stata.
# Same simulations but in R
## Step 1: Generate the same data
```{r}
time.periods <- 15
n <- 300
# unit data
id <- 1:n
group <- sample(seq(10,16), n, replace=TRUE)
unit_data <- data.frame(id=id, group=group)
# generate panel data
panel_data <- data.frame(id=sort(rep(id,time.periods)),
tp=rep(rep(1:time.periods),n))
panel_data <- merge(panel_data, unit_data, by="id")
panel_data$D <- 1*(panel_data$tp >= panel_data$group)
# generate heterogeneous treatment effects by calendar date
tau <- (panel_data$D==1)*(panel_data$tp - 12.5)
panel_data$Y <- panel_data$id + 3*panel_data$tp +
tau*panel_data$D + rnorm(nrow(panel_data))
```
## Step 2: Use `did` package
For this part, let's try two different things. First, we'll try the default version of our code where we first compute all possible group-time average treatment effects (including pre-treatment ones), then use these to compute an event study. In addition, we default to using the multiplier bootstrap which opens up the possibility of computing uniform confidence (another default for us) that are particularly nice in the context of event studies because they provide robustness to multiple hypothesis testing (since we are estimating effects of the treatment at different lengths of exposure).
```{r}
library(did)
# with 1000 bootstrap iterations
current_time <- proc.time()
out <- att_gt(yname="Y",
gname="group",
idname="id",
tname="tp",
data=panel_data,
bstrap=TRUE,
biters=1000)
dyn <- aggte(out, type="dynamic")
proc.time() - current_time
```
Second, let's try the same thing but with analytical standard errors.
```{r}
# with analytical standard errors
current_time <- proc.time()
out2 <- att_gt(yname="Y",
gname="group",
idname="id",
tname="tp",
data=panel_data,
bstrap=FALSE)
dyn2 <- aggte(out, type="dynamic")
proc.time() - current_time
```
# Conclusion
This seems like mostly good news. Our main code is in the R `did` package, and, if you run that, our code delivers estimates of all group-time average treatment effects and an event study (in about a second if you use analytical standard errors) and can additionally provide uniform confidence bands if you use the bootstrap (in about two seconds if you use our multiplier bootstrap procedure with 1000 bootstrap iterations).
Our Stata code is slower, but we (well, mainly [Fernando Rios-Avila](https://friosavila.github.io/playingwithstata/)) have been making rapid progress on the Stata implementation. At the moment, it uses a different bootstrap procedure than the R code does (which I suspect is the main reason for the differences in computational time), but I expect the Stata code to be running much faster soon.
# References
* Borusyak, Kirill, Xavier Jaravel, and Jann Spiess. "Revisiting Event Study Designs: Robust and Efficient Estimation." Working Paper (2021).
* Callaway, Brantly, and Pedro HC Sant’Anna. "Difference-in-differences with multiple time periods." Journal of Econometrics (2020).
* de Chaisemartin, Clément, and Xavier d'Haultfoeuille. "Two-way fixed effects estimators with heterogeneous treatment effects." American Economic Review 110.9 (2020): 2964-96.
* Sun, Liyang, and Sarah Abraham. "Estimating dynamic treatment effects in event studies with heterogeneous treatment effects." Journal of Econometrics (2020).brantI just ran into the [`did_imputation` Stata command](https://github.com/borusyak/did_imputation) which, mainly, contains the code for implementing the ideas in Borusyak, Jaravel, and Spiess (2021). Interestingly, the new package provides calls to recent alternatives to two-way fixed effects in de Chaisemartin and D'Haultfoeuille (2020), Sun and Abraham (2020), and Callaway and Sant'Anna (2020) --- so you can see estimates all in the same plot:Is our code slow…?2021-06-11T00:00:00+00:002021-06-11T00:00:00+00:00https://bcallaway11.github.io/posts/cs-slow-code<p>I just ran into the <a href="https://github.com/borusyak/did_imputation"><code class="language-plaintext highlighter-rouge">did_imputation</code> Stata
command</a> which, mainly,
contains the code for implementing the ideas in Borusyak, Jaravel, and
Spiess (2021). Interestingly, the new package provides calls to recent
alternatives to two-way fixed effects in de Chaisemartin and
D’Haultfoeuille (2020), Sun and Abraham (2020), and Callaway and
Sant’Anna (2020) — so you can see estimates all in the same plot:</p>
<p><img src="/assets/images/cs_slow.jpeg" /></p>
<p>This is great, as users can display several of these new estimators
along the same plot. What catches <em>my</em> eye here though is how slow our
code appears to be: taking over two minutes to run compared to about 1
second for other approaches. This is not even a very complicated
simulation either; there are 300 units and 15 time periods. If our code
doesn’t run fast in this case, it is a bad sign!</p>
<p>The other thing that I immediately notice is that <code class="language-plaintext highlighter-rouge">did_imputation</code> is
written in Stata, and the main version of our code is written in R. Our
Stata version is, at the moment, a brand new proof-of-concept and still
in beta mode. Let’s see what happens if we try the same simulations but
in R using the <code class="language-plaintext highlighter-rouge">did</code> package instead of Stata.</p>
<h1 id="same-simulations-but-in-r">Same simulations but in R</h1>
<h2 id="step-1-generate-the-same-data">Step 1: Generate the same data</h2>
<pre><code class="language-{.r}">time.periods <- 15
n <- 300
# unit data
id <- 1:n
group <- sample(seq(10,16), n, replace=TRUE)
unit_data <- data.frame(id=id, group=group)
# generate panel data
panel_data <- data.frame(id=sort(rep(id,time.periods)),
tp=rep(rep(1:time.periods),n))
panel_data <- merge(panel_data, unit_data, by="id")
panel_data$D <- 1*(panel_data$tp >= panel_data$group)
# generate heterogeneous treatment effects by calendar date
tau <- (panel_data$D==1)*(panel_data$tp - 12.5)
panel_data$Y <- panel_data$id + 3*panel_data$tp +
tau*panel_data$D + rnorm(nrow(panel_data))
</code></pre>
<h2 id="step-2-use-did-package">Step 2: Use <code class="language-plaintext highlighter-rouge">did</code> package</h2>
<p>For this part, let’s try two different things. First, we’ll try the
default version of our code where we first compute all possible
group-time average treatment effects (including pre-treatment ones),
then use these to compute an event study. In addition, we default to
using the multiplier bootstrap which opens up the possibility of
computing uniform confidence (another default for us) that are
particularly nice in the context of event studies because they provide
robustness to multiple hypothesis testing (since we are estimating
effects of the treatment at different lengths of exposure).</p>
<pre><code class="language-{.r}">library(did)
# with 1000 bootstrap iterations
current_time <- proc.time()
out <- att_gt(yname="Y",
gname="group",
idname="id",
tname="tp",
data=panel_data,
bstrap=TRUE,
biters=1000)
dyn <- aggte(out, type="dynamic")
proc.time() - current_time
</code></pre>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>## user system elapsed
## 1.924 0.044 1.968
</code></pre></div></div>
<p>Second, let’s try the same thing but with analytical standard errors.</p>
<pre><code class="language-{.r}"># with analytical standard errors
current_time <- proc.time()
out2 <- att_gt(yname="Y",
gname="group",
idname="id",
tname="tp",
data=panel_data,
bstrap=FALSE)
dyn2 <- aggte(out, type="dynamic")
proc.time() - current_time
</code></pre>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>## user system elapsed
## 1.279 0.000 1.280
</code></pre></div></div>
<h1 id="conclusion">Conclusion</h1>
<p>This seems like mostly good news. Our main code is in the R <code class="language-plaintext highlighter-rouge">did</code>
package, and, if you run that, our code delivers estimates of all
group-time average treatment effects and an event study (in about a
second if you use analytical standard errors) and can additionally
provide uniform confidence bands if you use the bootstrap (in about two
seconds if you use our multiplier bootstrap procedure with 1000
bootstrap iterations).</p>
<p>Our Stata code is slower, but we (well, mainly <a href="https://friosavila.github.io/playingwithstata/">Fernando
Rios-Avila</a>) have been
making rapid progress on the Stata implementation. At the moment, it
uses a different bootstrap procedure than the R code does (which I
suspect is the main reason for the differences in computational time),
but I expect the Stata code to be running much faster soon.</p>
<h1 id="references">References</h1>
<ul>
<li>
<p>Borusyak, Kirill, Xavier Jaravel, and Jann Spiess. “Revisiting Event
Study Designs: Robust and Efficient Estimation.” Working Paper
(2021).</p>
</li>
<li>
<p>Callaway, Brantly, and Pedro HC Sant’Anna.
“Difference-in-differences with multiple time periods.” Journal of
Econometrics (2020).</p>
</li>
<li>
<p>de Chaisemartin, Clément, and Xavier d’Haultfoeuille. “Two-way fixed
effects estimators with heterogeneous treatment effects.” American
Economic Review 110.9 (2020): 2964-96.</p>
</li>
<li>
<p>Sun, Liyang, and Sarah Abraham. “Estimating dynamic treatment
effects in event studies with heterogeneous treatment effects.”
Journal of Econometrics (2020).</p>
</li>
</ul>brantI just ran into the did_imputation Stata command which, mainly, contains the code for implementing the ideas in Borusyak, Jaravel, and Spiess (2021). Interestingly, the new package provides calls to recent alternatives to two-way fixed effects in de Chaisemartin and D’Haultfoeuille (2020), Sun and Abraham (2020), and Callaway and Sant’Anna (2020) — so you can see estimates all in the same plot:Five Minute Summary: Policy Evaluation during a Pandemic2021-05-16T00:00:00+00:002021-05-16T00:00:00+00:00https://bcallaway11.github.io/posts/pandemic-policy-1-minute## Introduction
[Tong Li](https://my.vanderbilt.edu/tlwebpage/) and I just posted a new working paper called [Policy Evaluation during a Pandemic](https://arxiv.org/abs/2105.06927). This is our second paper about policy evaluation in the context of the pandemic. In the first paper, [Evaluating Policies Early in a Pandemic: Bounding Policy Effects with Nonrandomly Missing Data](https://arxiv.org/abs/2005.09605), we were mainly interested in dealing with Covid-19 testing being non-random (as well as different testing rates, etc. across different locations). In that paper, we ended up proposed a matching estimator, but we got a lot of comments asking: *Why not difference-in-differences?*
We became quite interested in answering that question. Originally, we just "had the sense" that DID was not the right tool to use here. But that has developed into a fully fledged paper now.
And our answer turns out to be the same as before: to us, it seems like a lot better idea to carefully condition on pre-treatment pandemic related variables (e.g., number of cases, fraction of the population still susceptible, population size, and perhaps other variables like population density or demographics) rather than try to "difference out" location-specific fixed effects. In other words, we think unconfoundedness-type identification strategies are likely to be more appropriate than DID-type identification strategies when it comes to identifying effects of Covid-19 related policies.
## DID or Unconfoundedness for Evaluating Policies during a Pandemic?
The key issue is that epidemic models from the epidemiology literature are highly nonlinear. A leading example is stochastic SIRD model (SIRD stands for S=Susceptible, I=Infected, R=Recovered, D=Dead). The key equations in this model look like
$$
I_{lt}(0) = (1-\lambda-\gamma)I_{lt-1}(0) + \beta \frac{I_{lt-1}(0)}{N_l} S_{lt-1}(0) + U_{lt}
$$
where $$\lambda, \gamma,$$ and $$\beta$$ are parameters related to the recovery rate, death rate, and infection rate, respectively; $$N_l$$ is the number of individuals in a particular location, $$U_{lt}$$ is an idiosyncratic shock, and variables indexed by $$\bullet(0)$$ are "potential outcomes" (the values those variables would take if the policy were not implemented).
You can immediately see that this is a *much different* model from the one that would typically lead to difference in differences:
$$
I_{lt}(0) = \theta_t + \eta_i + U_{lt}
$$
where $$\theta_t$$ is a time fixed effect and $$\eta_i$$ is an individual fixed effect.
This shouldn't be a big surprise either --- pandemics are much different from many of the panel data sorts of applications that we commonly consider in economics. In particular, the spread of a pandemic is not really related to a particular location's "pandemic fixed effect"; this is much different from, say, applications in labor economics where it seems much more reasonable to think that an individual's earnings are related to their unobserved, time invariant "skill".
## Does the identification strategy actually matter?
The short answer is: yes.
In the paper we consider both simulations related to this and an application on shelter-in-place orders. Just to keep things short, let's just consider the simulations here.
For the simulations, for simplicity, we consider the case where the policy has no effect on Covid-19 cases (this makes it easy to check if the approach is working well as we can just check if estimated policy effects are close to 0). In addition, we consider the case where the first Covid-19 case tends to show up in treated locations earlier than for untreated locations.
To start with, here is a plot of what a pandemic looks like in a stochastic SIRD model. The notation here is the same as above; the additional variable $$C$$ is the cumulative number of cases. The policy is implemented when $$t=150$$, but it has no effect.
![](/files/pandemic-policy/sim_example.jpg)
Next, is a figure showing estimated effects of the policy on cumulative Covid-19 cases using DID. Here, we (incorrectly) estimate that the policy decreased cumulative cases. Basically, treated locations (which tended to get their first cases earlier) and untreated locations are not following the same path of untreated potential outcomes due to the nonlinearity of the model. Interestingly, it is possible to make the bias positive if you set the timing of the policy differently.
![](/files/pandemic-policy/did_es_example3.jpg)
The last figure involves estimating the effect of the policy by comparing locations that have similar pre-treatment pandemic-related characteristics (i.e., under unconfoundedness as we suggest doing in the paper). You can immediately see that this approach works much better.
![](/files/pandemic-policy/unc_es_example3.jpg)
## The Rest of the Paper...
* We propose doubly robust estimators of policy effects. These sorts of estimators are attractive in this case because they provide consistent estimates of policy effects if either (i) the propensity score (which is related to modeling the probability that a location adopts the policy) or (ii) an outcome regression model (related to the epidemic model in the absence of the treatment) is correctly specified. This setup is very attractive here as it gives a way to evaluate policies while partially circumventing the challenge of estimating a full pandemic model. Basically, we get to the case where you need to compare locations that implemented the policy to locations that didn't implement the policy (or implemented it later) conditional on having the same pre-policy characteristics that are related to the pandemic --- economists know a lot about this setting.
* We also consider the case where a researcher is interested in understanding the effect of a Covid-19 related policy on an economic outcome (rather than Covid-19 cases) in the particular case when (i) the policy can affect the outcome directly, (ii) the policy can affect the number of Covid-19 cases, and (iii) the number of Covid-19 cases can have its own effect on the economic outcome. We show:
* Neither standard DID nor including number of cases as a covariate deliver consistent estimates of ATT-type parameters in this case.
* We propose a way to "adjust" for the policy affecting cases and deliver a reasonable ATT-type effect of the policy on economic outcomes.
* We also have an application about the effects of shelter-in-place orders on Covid-19 cases and recreational travel. We find that the results are quite sensitive to which methodological approach one chooses.brant## IntroductionFive Minute Summary: Policy Evaluation during a Pandemic2021-05-16T00:00:00+00:002021-05-16T00:00:00+00:00https://bcallaway11.github.io/posts/pandemic-policy-1-minute<h2 id="introduction">Introduction</h2>
<p><a href="https://my.vanderbilt.edu/tlwebpage/">Tong Li</a> and I just posted a new
working paper called <a href="https://arxiv.org/abs/2105.06927">Policy Evaluation during a
Pandemic</a>. This is our second paper
about policy evaluation in the context of the pandemic. In the first
paper, <a href="https://arxiv.org/abs/2005.09605">Evaluating Policies Early in a Pandemic: Bounding Policy Effects
with Nonrandomly Missing Data</a>, we
were mainly interested in dealing with Covid-19 testing being non-random
(as well as different testing rates, etc. across different locations).
In that paper, we ended up proposed a matching estimator, but we got a
lot of comments asking: <em>Why not difference-in-differences?</em></p>
<p>We became quite interested in answering that question. Originally, we
just “had the sense” that DID was not the right tool to use here. But
that has developed into a fully fledged paper now.</p>
<p>And our answer turns out to be the same as before: to us, it seems like
a lot better idea to carefully condition on pre-treatment pandemic
related variables (e.g., number of cases, fraction of the population
still susceptible, population size, and perhaps other variables like
population density or demographics) rather than try to “difference out”
location-specific fixed effects. In other words, we think
unconfoundedness-type identification strategies are likely to be more
appropriate than DID-type identification strategies when it comes to
identifying effects of Covid-19 related policies.</p>
<h2 id="did-or-unconfoundedness-for-evaluating-policies-during-a-pandemic">DID or Unconfoundedness for Evaluating Policies during a Pandemic?</h2>
<p>The key issue is that epidemic models from the epidemiology literature
are highly nonlinear. A leading example is stochastic SIRD model (SIRD
stands for S=Susceptible, I=Infected, R=Recovered, D=Dead). The key
equations in this model look like</p>
\[I_{lt}(0) = (1-\lambda-\gamma)I_{lt-1}(0) + \beta \frac{I_{lt-1}(0)}{N_l} S_{lt-1}(0) + U_{lt}\]
<p>where \(\lambda, \gamma,\) and \(\beta\) are parameters related to the
recovery rate, death rate, and infection rate, respectively; \(N_l\) is
the number of individuals in a particular location, \(U_{lt}\) is an
idiosyncratic shock, and variables indexed by \(\bullet(0)\) are
“potential outcomes” (the values those variables would take if the
policy were not implemented).</p>
<p>You can immediately see that this is a <em>much different</em> model from the
one that would typically lead to difference in differences:</p>
\[I_{lt}(0) = \theta_t + \eta_i + U_{lt}\]
<p>where \(\theta_t\) is a time fixed effect and \(\eta_i\) is an
individual fixed effect.</p>
<p>This shouldn’t be a big surprise either — pandemics are much different
from many of the panel data sorts of applications that we commonly
consider in economics. In particular, the spread of a pandemic is not
really related to a particular location’s “pandemic fixed effect”; this
is much different from, say, applications in labor economics where it
seems much more reasonable to think that an individual’s earnings are
related to their unobserved, time invariant “skill”.</p>
<h2 id="does-the-identification-strategy-actually-matter">Does the identification strategy actually matter?</h2>
<p>The short answer is: yes.</p>
<p>In the paper we consider both simulations related to this and an
application on shelter-in-place orders. Just to keep things short, let’s
just consider the simulations here.</p>
<p>For the simulations, for simplicity, we consider the case where the
policy has no effect on Covid-19 cases (this makes it easy to check if
the approach is working well as we can just check if estimated policy
effects are close to 0). In addition, we consider the case where the
first Covid-19 case tends to show up in treated locations earlier than
for untreated locations.</p>
<p>To start with, here is a plot of what a pandemic looks like in a
stochastic SIRD model. The notation here is the same as above; the
additional variable \(C\) is the cumulative number of cases. The policy
is implemented when \(t=150\), but it has no effect.</p>
<p><img src="/files/pandemic-policy/sim_example.jpg" alt="" /></p>
<p>Next, is a figure showing estimated effects of the policy on cumulative
Covid-19 cases using DID. Here, we (incorrectly) estimate that the
policy decreased cumulative cases. Basically, treated locations (which
tended to get their first cases earlier) and untreated locations are not
following the same path of untreated potential outcomes due to the
nonlinearity of the model. Interestingly, it is possible to make the
bias positive if you set the timing of the policy differently.</p>
<p><img src="/files/pandemic-policy/did_es_example3.jpg" alt="" /></p>
<p>The last figure involves estimating the effect of the policy by
comparing locations that have similar pre-treatment pandemic-related
characteristics (i.e., under unconfoundedness as we suggest doing in the
paper). You can immediately see that this approach works much better.</p>
<p><img src="/files/pandemic-policy/unc_es_example3.jpg" alt="" /></p>
<h2 id="the-rest-of-the-paper">The Rest of the Paper…</h2>
<ul>
<li>
<p>We propose doubly robust estimators of policy effects. These sorts
of estimators are attractive in this case because they provide
consistent estimates of policy effects if either (i) the propensity
score (which is related to modeling the probability that a location
adopts the policy) or (ii) an outcome regression model (related to
the epidemic model in the absence of the treatment) is correctly
specified. This setup is very attractive here as it gives a way to
evaluate policies while partially circumventing the challenge of
estimating a full pandemic model. Basically, we get to the case
where you need to compare locations that implemented the policy to
locations that didn’t implement the policy (or implemented it later)
conditional on having the same pre-policy characteristics that are
related to the pandemic — economists know a lot about this
setting.</p>
</li>
<li>
<p>We also consider the case where a researcher is interested in
understanding the effect of a Covid-19 related policy on an economic
outcome (rather than Covid-19 cases) in the particular case when (i)
the policy can affect the outcome directly, (ii) the policy can
affect the number of Covid-19 cases, and (iii) the number of
Covid-19 cases can have its own effect on the economic outcome. We
show:</p>
<ul>
<li>
<p>Neither standard DID nor including number of cases as a
covariate deliver consistent estimates of ATT-type parameters in
this case.</p>
</li>
<li>
<p>We propose a way to “adjust” for the policy affecting cases and
deliver a reasonable ATT-type effect of the policy on economic
outcomes.</p>
</li>
</ul>
</li>
<li>
<p>We also have an application about the effects of shelter-in-place
orders on Covid-19 cases and recreational travel. We find that the
results are quite sensitive to which methodological approach one
chooses.</p>
</li>
</ul>brantIntroduction