Homework 6

\[\newcommand{\E}{\mathbb{E}} \renewcommand{\P}{\textrm{P}} \let\L\relax \newcommand{\L}{\textrm{L}} %doesn't work in .qmd, place this command at start of qmd file to use it \newcommand{\F}{\textrm{F}} \newcommand{\var}{\textrm{var}} \newcommand{\cov}{\textrm{cov}} \newcommand{\corr}{\textrm{corr}} \newcommand{\Var}{\mathrm{var}} \newcommand{\Cov}{\mathrm{cov}} \newcommand{\Corr}{\mathrm{corr}} \newcommand{\sd}{\mathrm{sd}} \newcommand{\se}{\mathrm{s.e.}} \newcommand{\T}{T} \newcommand{\indicator}[1]{\mathbb{1}\{#1\}} \newcommand\independent{\protect\mathpalette{\protect\independenT}{\perp}} \def\independenT#1#2{\mathrel{\setbox0\hbox{$#1#2$}% \copy0\kern-\wd0\mkern4mu\box0}} \newcommand{\N}{\mathcal{N}}\]

Due: At the start of class on Monday, April 29. Please turn in a hard copy.

Textbook Questions: Hansen 17.2

Additional Question 1:

For this problem, use the data job_displacement_clean2.RData posted here. For this problem, we are interested in estimating the (causal) effect of job displacement on earnings. The key variables in this data are learn (which is the log of earnings and is the outcome variable in this problem) and first.displaced (which contains the time period when an individual becomes displaced) — first.displaced is the variable that is used to form “groups” below. It is set equal to 0 for individuals that are not displaced from their job in any period. (As always, you are not allowed to use built-in R package, such as plm, fixest or did, to compute the results to these questions though you are welcome to compare your results to output from those packages.)

Hint: The number of observations is fairly large in this data and, therefore, inverting some matrices can be quite time-consuming. I recommend that you use the Matrix package to deal with the matrices in this question. This package has the ability to deal with “sparse” matrices in an efficient way which is helpful for this problem.

  1. Use the within estimator to estimate the following regression model \[\begin{align*} Y_{it} = \theta_t + \eta_i + \alpha D_{it} + e_{it} \end{align*}\] Report an estimate of \(\alpha\) and its standard error.

  2. Compute difference-in-differences versions of group-time average treatment effects (where you can define group by the time period when an individual becomes displaced from their job). Use individuals who are never displaced from their job as the comparison group.

  3. Aggregate the group-time average treatment effects in part (b) into an overall treatment effect parameter. Use the bootstrap to compute a standard error for it. How do these results compare to the ones from parts (a)?

  4. Compute the weights on underlying group-time average treatment effects that come from the two-way fixed effects regression in part (a). How do these compare to the the weights from part (c).

  5. Aggregate the group-time average treatment effects in part (b) into an event study. Use the bootstrap to compute standard errors, and plot the estimates along with a 95% confidence interval.

Additional Question 2

Suppose you are interested in the structural model \(Y = X'\beta + e\), but where \(\E[Xe] \neq 0\). Suppose that you have access to an \(l \times 1\) vector of instruments \(Z\) (where \(l > k\) with \(k\) being the dimension of \(X\)) that satisfy \(\E[Ze]=0\). Suppose that you estimate \(\beta\) using GMM. That is, you calculate \[\begin{align*} \hat{\beta}_{gmm} &= \underset{b}{\textrm{argmin}\ } \Big( \mathbf{Z}'\mathbf{Y}-\mathbf{Z}'\mathbf{X}b \Big)'\widehat{\mathbf{W}} \Big(\mathbf{Z}'\mathbf{Y}-\mathbf{Z}'\mathbf{X}b\Big) \end{align*}\] where \(\widehat{\mathbf{W}}\) is an \(l \times l\) weighting matrix that satisfies \(\widehat{\mathbf{W}} \xrightarrow{p} \mathbf{W}\) which is a positive definite matrix (also, here \(\mathbf{Y}\), \(\mathbf{X}\), and \(\mathbf{Z}\) are all data matrices). Derive an explicit expression for \(\hat{\beta}_{gmm}\) and show that \(\sqrt{n}(\hat{\beta}_{gmm} - \beta) \xrightarrow{d} \N(0,\mathbf{V})\), and provide an expression for \(\mathbf{V}\).

Additional Question 3

For this problem, we will consider a smaller scale version of the problem in Additional Question 1. In particular, this question will focus on estimating group-time average treatment effects using GMM.

To start with, after loading the same data as in Additional Question 1, for this question, please run the following code:

# limit time periods to 2001, 2003, 2005, 2007, and groups to 2003, 
# 2005, 2007, and untreated
data <- subset(data, (year <= 2007))
data <- subset(data, first.displaced %in% c(0, 2003, 2005, 2007))

To keep the notation simple, below 2001 will be \(t=1\), 2003 will be \(t=2\), 2005 will be \(t=3\), and 2007 will be \(t=4\). Also, for simplicity, you can treat \(p_g := \P(G=g)\) as being known. We will make the parallel trends assumption that, for all \(t=2,\ldots,4\) and for all groups \(g\), \(\E[\Delta Y_t(0) | G=g] = \E[\Delta Y_t(0)]\).

  1. Please state all the non-redundant moment conditions that are implied by the parallel trends assumption. As a hint, I will give you two of them \[\begin{align*} \E\left[\left(\frac{\indicator{G=2}}{p_2} - \frac{U}{p_U}\right) \Delta Y_2 \right] - ATT(2,2) &= 0 \\ \E\left[\left(\frac{\indicator{G=3}}{p_3} - \frac{U}{p_U}\right) \Delta Y_2 \right] &= 0 \end{align*}\] where the first condition says that \(ATT(2,2)\) is equal to the mean path of outcomes for group 2 relative to the untreated group, and the second condition says that the mean path of outcomes in period 2 should be the same for group 3 and the untreated group (because this is a pre-treatment period for group 3).

  2. Using the weighting matrix \(\mathbf{W} = \mathbf{I}_l\) where \(l\) is the number of moment conditions from part (a), compute estimates of \(ATT(g,t)\) for all post-treatment periods for groups 2, 3, and 4. How do these estimates compare to the corresponding ones from Additional Question 1?

  3. Now, compute the efficient GMM estimator of all of the \(ATT(g,t)\)’s mentioned in part (c). To estimate the efficient weighting matrix, you can use the preliminary estimates of the \(ATT(g,t)\)’s from part (c). How do these estimates compare to the ones from part (b)?

  4. Compute a J-test for over-identification. How do you interpret the results?