Due: This project is optional. You can turn it in by email (to both me and Brad Curtis) any time before 11:59pm on Tuesday Dec. 6.
For this project, we are going to try to evaluate the causal effect of a job training program. The data comes from Lalonde (1986) which was a highly influential study in economics about the difficulties of using observational data to estimate causal effects.
There are two data files:
jtrain_observational.dta
— this contains
observational data for evaluating the effect of the job training
program. Use this in the first step
below
jtrain_experimental.dta
— this contains experimental
data for evaluating the effect of the job training program. Use
this only in the second step below.
The key outcome variable in each dataset is re78
, which
is real earnings in 1978, and train
, which is equal to 1
for individuals that participated in the job training program and 0
otherwise (the job training program actually took place between 1975 and
1978, timing varied somewhat across individuals). A number of additional
variables are available in each dataset, and descriptions of the
available data is available in
jtrain_observational__description.txt
and
jtrain_experimental_description.txt
which are posted on ELC
(there are very slight differences in the available variables in each
dataset).
Part 1: Using the jtrain_observational.dta
data, I want
you to try to deliver an esimate of the causal effect of the job
training program on earnings. You are free to use whatever approach you
think is most appropriate to deliver this estimate.
Part 2: Using the data jtrain_experimental.dta
data, I
want you to estimate the causal effect of job training on earnings. This
is experimental data, so in principle, this estimate should be very
credible. Therefore, I want you to compare your estimate from the first
part to the one in this part. Are they close to each other?
Please do not do this part until you have fully completed Part
1; you will not be graded on how close your estimate from Part 1 is to
the one in Part 2, but I would like for you to get a sense of whether or
not your approach from Part 1 seemed to work.
3-5 pages
In part 1, report the difference in average earnings between individuals that participated in job training on those that didn’t. Do you think this difference should be interpreted as the causal effect? Explain.
In part 1, report a table of summary statistics (i.e., averages of available, relevant data separately for individuals that participated in job training and those that didn’t). Are there big differences? Do they matter?
In part 1, propose the best approach that you can come up with for estimating the causal effects of the job training program. I strongly encourage you to completely think through the model you want to estimate in this step before proceeding to the next step.
In part 1, implement the approach you proposed and report an estimate of the causal effect.
In part 1 (and before moving to part 2), how confident do you feel in your estimate of the causal effect? Before moving to part 2, discuss any doubts that you might have and whether or not you think you might have over- or under-estimated the causal effect.
In part 2, explain how you can use the experimental data to come up with a credible estimate of the causal effect of the job training program (and why this works).
In part 2, compare your estimate from part 1 with the one you get in part 2. Are they close to each other? Provide some discussion relating (i) how confident you were in your estimates from part 1 to (ii) how close they are to the estimates from part 2.
Part 1 | 10pts |
Part 2 - Estimates of causal effects | 5pts |
Discussion of results, overall clarity of arguments | 5pts |