Due: Tuesday, Dec. 7 at the beginning of class.

For this project, we are going to try to evaluate the causal effect of a job training program. The data comes from Lalonde (1986) which was a highly influential study in economics about the difficulties of using observational data to estimate causal effects.

### Data

There are two data files:

• jtrain_observational.dta — this contains observational data for evaluating the effect of the job training program. Use this in the first step below

• jtrain_experimental.dta — this contains experimental data for evaluating the effect of the job training program. Use this only in the second step below.

The key outcome variable in each dataset is re78, which is real earnings in 1978, and train, which is equal to 1 for individuals that participated in the job training program and 0 otherwise (the job training program actually took place between 1975 and 1978, timing varied somewhat across individuals). A number of additional variables are available in each dataset, and descriptions of the available data is available in jtrain_observational__description.txt and jtrain_experimental_description.txt which are posted on ELC (there are very slight differences in the available variables in each dataset).

### What to do:

Part 1: Using the jtrain_observational.dta data, I want you to try to deliver an esimate of the causal effect of the job training program on earnings. You are free to use whatever approach you think is most appropriate to deliver this estimate.

Part 2: Using the data jtrain_experimental.dta data, I want you to estimate the causal effect of job training on earnings. This is experimental data, so in principle, this estimate should be very credible. Therefore, I want you to compare your estimate from the first part to the one in this part. Are they close to each other? Please do not do this part until you have fully completed Part 1; you will not be graded on how close your estimate from Part 1 is to the one in Part 2, but I would like for you to get a sense of whether or not your approach from Part 1 seemed to work.

### What to turn in

• 3-5 pages

• In part 1, report the difference in average earnings between individuals that participated in job training on those that didn’t. Do you think this difference should be interpreted as the causal effect? Explain.

• In part 1, report a table of summary statistics (i.e., averages of available, relevant data separately for individuals that participated in job training and those that didn’t). Are there big differences? Do they matter?

• In part 1, propose the best approach that you can come up with for estimating the causal effects of the job training program. I strongly encourage you to completely think through the model you want to estimate in this step before proceeding to the next step.

• In part 1, implement the approach you proposed and report an estimate of the causal effect.

• In part 1 (and before moving to part 2), how confident do you feel in your estimate of the causal effect? Before moving to part 2, discuss any doubts that you might have and whether or not you think you might have over- or under-estimated the causal effect.

• In part 2, explain how you can use the experimental data to come up with a credible estimate of the causal effect of the job training program (and why this works).

• In part 2, compare your estimate from part 1 with the one you get in part 2. Are they close to each other? Provide some discussion relating (i) how confident you were in your estimates from part 1 to (ii) how close they are to the estimates from part 2.

### Grading Criteria

 Part 1 10pts Part 2 - Estimates of causal effects 5pts Discussion of results, overall clarity of arguments 5pts