A function for difference-in-differences with a continuous treatment in a staggered treatment adoption setting.
cont_did
currently supports staggered treatment with continuous treatments using
B-splines under the hood.
Usage
cont_did(
yname,
dname,
gname = NULL,
tname,
idname,
xformula = ~1,
data,
target_parameter = c("level", "slope"),
aggregation = c("dose", "eventstudy", "none"),
treatment_type = c("continuous", "discrete"),
dose_est_method = c("parametric", "cck"),
dvals = NULL,
degree = 3,
num_knots = 0,
allow_unbalanced_panel = FALSE,
control_group = c("notyettreated", "nevertreated", "eventuallytreated"),
anticipation = 0,
weightsname = NULL,
alp = 0.05,
bstrap = TRUE,
cband = FALSE,
boot_type = "multiplier",
biters = 1000,
clustervars = NULL,
est_method = NULL,
base_period = "varying",
print_details = FALSE,
cl = 1,
...
)
Arguments
- yname
The name of the outcome variable
- dname
The name of the treatment variable in the data. The functionality of
cont_did
is different from thedid
package in that the treatment variable is the "amount" of the treatment in a particular period, rather thangname
which gives the time period when a unit becomes treated. Thedname
variable should, for a particular unit, be constant across time periods—even in pre-treatment periods. For units that never participate in the treatment, the amount of the treatment may not be defined in some applications—it is ignored in this function.- gname
The name of the timing-group variable, i.e., when treatment starts for a particular unit. The value of this variable should be set to be 0 for units that do not participate in the treatment in any time period.
- tname
The name of the column containing the time periods
- idname
The individual (cross-sectional unit) id name
- xformula
A formula for additional covariates. This is not currently supported.
- data
The name of the data.frame that contains the data
- target_parameter
Two options are "level" and "slope". In the first case, the function will report level effects, i.e., ATT's. In the second case, the function will report slope effects, i.e., ACRT's
- aggregation
"dose" averages across timing-groups and time periods and provides results as a function of the dose. "eventstudy" averages across timing-groups and doses and reports results as a function of the length of exposure to the treatment.
"none" is a stub for reporting fully disaggregated results that can be processed as desired by the user. This is not currently supported though.
The combination of the arguments
target_parameter
andaggregation
strongly affects the behavior of the function (and target of the analysis). For example, settingtarget_parameter="level"
andaggregation="eventstudy"
is effectively the same thing as binarizing the treatment (i.e., where units are considered treated if they experience any positive amount of the treatment) and reporting an event study.- treatment_type
"continuous" or "discrete" depending on the nature of the treatment. Default is "continuous". "discrete" is not yet supported.
- dose_est_method
The method used to estimate the dose-specific effects. The default is "parametric", where the user needs to specify the number of knots and degree for a B-spline which is assumed to be correctly specified. The other option is "cck" which uses the a data-driven nonparametric method to estimate the dose-specific effects based on the
npiv
package and Chen, Christensen, and Kankanala (ReStud, 2025).- dvals
The values of the treatment at which to compute dose-specific effects. If it is not specified, the default choice will be use the percentiles of the dose among all ever-treated units.
- degree
The degree of the B-Spline used in estimation. The default is 3, which in combination with the default choice for the
num-knots
, leads to fitting models for the group of treated units that only that is a cubic polynomial in the dose. Settingdegree=1
will lead to a linear model, while settingdegree=2
will lead to a quadratic model.- num_knots
The number of knots to include for the B-Spline. The default is 0 so that the spline is global (i.e., this will amount to fitting a global polynomial). There is a bias-variance tradeoff for including more or less knots.
- allow_unbalanced_panel
Whether or not function should "balance" the panel with respect to time and id. The default values if
FALSE
which means thatatt_gt()
will drop all units where data is not observed in all periods. The advantage of this is that the computations are faster (sometimes substantially).- control_group
Which units to use the control group. The default is "nevertreated" which sets the control group to be the group of units that never participate in the treatment. This group does not change across groups or time periods. The other option is to set
group="notyettreated"
. In this case, the control group is set to the group of units that have not yet participated in the treatment in that time period. This includes all never treated units, but it includes additional units that eventually participate in the treatment, but have not participated yet.- anticipation
The number of time periods before participating in the treatment where units can anticipate participating in the treatment and therefore it can affect their untreated potential outcomes
- weightsname
The name of the column containing the sampling weights. If not set, all observations have same weight.
- alp
the significance level, default is 0.05
- bstrap
Boolean for whether or not to compute standard errors using the multiplier bootstrap. If standard errors are clustered, then one must set
bstrap=TRUE
. Default isTRUE
(in addition, cband is also by defaultTRUE
indicating that uniform confidence bands will be returned. If bstrap isFALSE
, then analytical standard errors are reported.- cband
Boolean for whether or not to compute a uniform confidence band that covers all of the group-time average treatment effects with fixed probability
1-alp
. In order to compute uniform confidence bands,bstrap
must also be set toTRUE
. The default isTRUE
.- boot_type
should be one of "multiplier" (the default) or "empirical". The multiplier bootstrap is generally much faster, but
attgt_fun
needs to provide an expression for the influence function (which could be challenging to figure out). If no influence function is provided, then thepte
package will use the empirical bootstrap no matter what the value of this parameter.- biters
The number of bootstrap iterations to use. The default is 1000, and this is only applicable if
bstrap=TRUE
.- clustervars
A vector of variables names to cluster on. At most, there can be two variables (otherwise will throw an error) and one of these must be the same as idname which allows for clustering at the individual level. By default, we cluster at individual level (when
bstrap=TRUE
).- est_method
the method to compute group-time average treatment effects. The default is "dr" which uses the doubly robust approach in the
DRDID
package. Other built-in methods include "ipw" for inverse probability weighting and "reg" for first step regression estimators. The user can also pass their own function for estimating group time average treatment effects. This should be a functionf(Y1,Y0,treat,covariates)
whereY1
is ann
x1
vector of outcomes in the post-treatment outcomes,Y0
is ann
x1
vector of pre-treatment outcomes,treat
is a vector indicating whether or not an individual participates in the treatment, andcovariates
is ann
xk
matrix of covariates. The function should return a list that includesATT
(an estimated average treatment effect), andinf.func
(ann
x1
influence function). The function can return other things as well, but these are the only two that are required.est_method
is only used if covariates are included.- base_period
Whether to use a "varying" base period or a "universal" base period. Either choice results in the same post-treatment estimates of ATT(g,t)'s. In pre-treatment periods, using a varying base period amounts to computing a pseudo-ATT in each treatment period by comparing the change in outcomes for a particular group relative to its comparison group in the pre-treatment periods (i.e., in pre-treatment periods this setting computes changes from period t-1 to period t, but repeatedly changes the value of t)
A universal base period fixes the base period to always be (g-anticipation-1). This does not compute pseudo-ATT(g,t)'s in pre-treatment periods, but rather reports average changes in outcomes from period t to (g-anticipation-1) for a particular group relative to its comparison group. This is analogous to what is often reported in event study regressions.
Using a varying base period results in an estimate of ATT(g,t) being reported in the period immediately before treatment. Using a universal base period normalizes the estimate in the period right before treatment (or earlier when the user allows for anticipation) to be equal to 0, but one extra estimate in an earlier period.
- print_details
Whether or not to show details/progress of computations. Default is
FALSE
.- cl
number of clusters to be used when bootstrapping; default is 1
- ...
extra arguments that can be passed to create the correct subsets of the data (depending on
subset_fun
), to estimate group time average treatment effects (depending onattgt_fun
), or to aggregating treatment effects (particularly useful aremin_e
,max_e
, andbalance_e
arguments to event study aggregations)
Examples
# build small simulated data
set.seed(1234)
df <- simulate_contdid_data(
n = 1000,
num_time_periods = 4,
num_groups = 4,
dose_linear_effect = 0,
dose_quadratic_effect = 0
)
# estimate effects of continuous treatment
cd_res <- cont_did(
yname = "Y",
tname = "time_period",
idname = "id",
dname = "D",
data = df,
gname = "G",
target_parameter = "slope",
aggregation = "dose",
treatment_type = "continuous",
control_group = "notyettreated",
biters = 50,
cband = TRUE,
num_knots = 1,
degree = 3,
)
#> Warning: critical value for uniform confidence band is somehow smaller than
#> critical value for pointwise confidence interval...using pointwise
#> confidence interal
summary(cd_res)
#>
#> Overall ATT:
#> ATT Std. Error [ 95% Conf. Int.]
#> -0.0332 0.0726 -0.1754 0.1091
#>
#>
#> Overall ACRT:
#> ACRT Std. Error [ 95% Conf. Int.]
#> -0.2988 0.143 -0.579 -0.0186 *
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#>