Due: This project is optional. You can turn it in by email (to both me and Brad Curtis) any time before 11:59pm on Tuesday Dec. 6.

For this project, we are going to try to evaluate the causal effect of a job training program. The data comes from Lalonde (1986) which was a highly influential study in economics about the difficulties of using observational data to estimate causal effects.


There are two data files:

The key outcome variable in each dataset is re78, which is real earnings in 1978, and train, which is equal to 1 for individuals that participated in the job training program and 0 otherwise (the job training program actually took place between 1975 and 1978, timing varied somewhat across individuals). A number of additional variables are available in each dataset, and descriptions of the available data is available in jtrain_observational__description.txt and jtrain_experimental_description.txt which are posted on ELC (there are very slight differences in the available variables in each dataset).

What to do:

Part 1: Using the jtrain_observational.dta data, I want you to try to deliver an esimate of the causal effect of the job training program on earnings. You are free to use whatever approach you think is most appropriate to deliver this estimate.

Part 2: Using the data jtrain_experimental.dta data, I want you to estimate the causal effect of job training on earnings. This is experimental data, so in principle, this estimate should be very credible. Therefore, I want you to compare your estimate from the first part to the one in this part. Are they close to each other? Please do not do this part until you have fully completed Part 1; you will not be graded on how close your estimate from Part 1 is to the one in Part 2, but I would like for you to get a sense of whether or not your approach from Part 1 seemed to work.

What to turn in

Grading Criteria

Part 1 10pts
Part 2 - Estimates of causal effects 5pts
Discussion of results, overall clarity of arguments 5pts