Homework 2

Due: At the start of class on Monday, Feb. 12. Please turn in a hard copy.

Extra Question 1 For this question, we will use data from Project Star. To access this data, please install the Ecdat package and run the following code: data(Star, package="Ecdat"). To view a description of the data, you can run the following code: ?Ecdat::Star.

Project Star is a very well-known experiment in Tennessee in the 1980s where students were randomly assigned to be in a small class, a regular class, or in a class with a teaching aid. The study was focused on whether or not reducing class sizes improved student’s test scores.

For this problem, we will be interested in the \(ATT\) of being in a small class relative to a regular class for boys on their math test scores (that is, I’d like for you to use a subset of the data that only includes boys and only includes students who were assigned to a small class or regular class).

Exploiting that the treatment is randomly assigned, calculate an estimate of the \(ATT\) by comparing the mean of tmathssk for boys in small class sizes relative to boys in regular classes.
Now, use a regression to calculate the same \(ATT\) as we were interested in from part (a). [For this part, you can compare your answer to results that use lm, but I’d like for you to use matrix algebra for your main result.] How do the results compare to the ones from part (a)?
Now, run a regression that additionally includes the teacher’s experience (totexpk) and free lunch status (freelunk) as additional regressors. [As before, you can compare your answer to the ones from lm, but please report your final answer using matrix algebra.] How do these results compare to the ones from parts (a) and (b)?

Extra Question 2

Consider the case with a binary treatment and suppose that unconfoundedness holds. Show that \(ATE\) is identified in this case (and provide an expression for it that indicates that it is identified).
How does this expression compare to the one that we derived for \(ATT\) in class?
Now, additionally suppose a linear model for untreated potential outcomes: \(Y_i(0) = X_i'\beta + e_i\) with \(\E[e|X]=0\) and treatment effect homogeneity: \(Y_i(1) - Y_i(0) = \alpha\) for all units. Explain how to use a linear regression to estimate the causal effect of participating in the treatment in this case (and explain where you use each condition above to provide this result).
How does this regression compare to the one that we derived in class after we had identified the ATT? Is it the same or different? Any comments/explanations?