8.10 Coding Questions

  1. For this problem, we will use the data rand_hie. This is data from the RAND health insurance experiment in the 1980s. In the experiment, participants were randomly assigned to get Catastrophic (the least amount of coverage), insurance that came with a Deductible, insurance that came with Cost Sharing (i.e., co-insurance so that an individual pays part of their medical insurance), and Free (so that there is no cost of medical care).

    For this problem, we will be interested in whether or not changing the type of health insurance changed the amount of health care utilization and the health status of individuals.

    We will focus on the difference between the least amount of health insurance (“Catastrophic”) and the most amount of health insurance (“Free”). In particular, you can start this problem by creating a new dataset as follows:

        rand_hie_subset <- subset(rand_hie, plan_type %in% c("Catastrophic", "Free"))

    and use this data to answer the questions below.

    1. Use a regression to estimate the average difference between total medical expenditure (total_med_expenditure) by plan type (plan_type) and report your results. Should you interpret these as average causal effects? Explain.

    2. Use a regression to estimate the average difference between face to face doctors visits (face_to_face_visits) by plan type (plan_type) and report your results. Should you interpret these as average causal effects? Explain.

    3. Use a regression to estimate the average difference between the overall health index (health_index) by plan type (plan_type) and report your results. Should you interpret these as average causal effects? Explain.

    4. How do you interpret the results from parts a-c?

  2. For this problem, we will study the causal effect of having more children on women’s labor supply using the data Fertility.

    1. Let’s start by running a regression of the number of hours that a woman typically works per week (work) on whether or not she has more than two children (morekids), her age and \(age^2\), and race/ethnicity (afam and hispanic). Report your results. How do you feel about interpreting the estimated coefficient on morekids as the causal effect of having more than two children? Explain.

    2. One possible instrument in this setup is the sex composition of the first two children (i.e., whether they are both girls, both boys, or a boy and a girl). The thinking here is that, at least in the United States, parents tend to have a preference for having both a girl and a boy and that, therefore, parents whose first two children have the same sex may be more likely to have a third child than they would have been if they have a girl and a boy. Do you think that using a binary variable for whether or not the first two children have the same sex is a reasonable instrument of for morekids from part a?

    3. Create a new variable called samesex that is equal to one for families whose first two children have the same sex. Using the same specification as in part a, use samesex as an instrument for morekids and report the results. Provide some discussion about your results.

  3. For this question, we will use the AJR data. A deep question in development economics is: Why are some countries much richer than other countries? One explanation for this is that richer countries have different institutions (e.g., property rights, democracy, etc.) that are conducive to growth. Its hard to study these questions though because institutions do not arise randomly — there could be reverse causality so that property rights, democracy, etc. are (perhaps partially) caused by being rich rather than the other way around. Alternatively, other factors (say a country’s geography) could cause both of these. We’ll consider one instrumental variables approach to thinking about this question in this problem.

    1. Run a regression of the log of per capita GDP (the log of per capita GDP is stored in the variable GDP) on a measure of the protection against expropriation risk (this is a measure of how “good” a country’s institutions are (a larger number indicates “better” institutions) and it is in the variable Exprop). How do you interpret these results? Do you think it would be reasonable to interpret the estimated coefficient on Exprop as the causal effect of institutions on GDP.

    2. One possible instrument for Exprop is settler mortality (we’ll use the log of this which is available in the variable logMort). Settler mortality is a measure of how dangerous it was for early settlers of a particular location. The idea is that places that have high settler mortality may have set up worse (sometimes called “extractive”) institutions than places that had lower settler mortality. But that settler mortality (from a long time ago) does not have any other direct effect on modern GDP. Provide some discussion about whether settler mortality is a valid instrument for institutions.

    3. Estimate an IV regression of GDP on Exprop using logMort as an instrument for Exprop. How do you interpret the results? How do these results compare to the ones from part a?

  4. For this question, we’ll use the data house to study the causal effect of incumbency on the probability that a member of the House of Representatives gets re-elected.

    1. One way to try to estimate the causal effect of incumbency is to just run a regression where the outcome is democratic_vote_share (this is the same outcome we’ll use below) and where the model includes a dummy variable for whether or not the democratic candidate is an incumbent. What are some limitions of this strategy?

    2. The house data contains data about the margin of victory (is positive if they won the election and negative if they lost) for Democratic candidates in the current election and data about the Democratic margin of victory in the past election. Explain how you could use this data in a regression discontinuity design to estimate the causal effect of incumbency.

    3. Use the house data to implement the regression discontinuity design that you proposed in part b. What do you estimate as the causal effect of incumbency?

  5. For this problem, we will use the data banks. We will study the causal effect of monetary policy on bank closures during the Great Depression. We’ll consider an interesting natural experiment in Mississippi where half the northern half of the state was in St. Louis’s federal reserve district (District 8) and the southern half of the state was in Atlanta’s federal reserve district (District 6). Atlanta had much looser monetary policy (meaning they substantially increased lending) than St. Louis during the early part of the Great Depression and our interest is in whether looser monetary policy made an difference.

    1. Plot the total number of banks separately for District 6 and District 8 across all available time periods in the data.

    2. An important event in the South early in the Great Depression was the collapse of Caldwell and Company — the largest banking chain in the South at the time. This happened in November 1930. The Atlanta Fed’s lending markedly increased quickly after this event while St. Louis’s did not. Calculate a DID estimate of the effect of looser monetary policy on the number of banks that are still in business. How do you interpret these results? Hint: You can calculate this by taking the difference between the number of banks in District 6 relative to the number of banks in District 8 across all time periods relative to the difference between the number of banks in District 6 relative to District 8 in the first period (July 1, 1929).