Homework 6 Solutions

Chapter 8, Coding Question 1

Part (a)

load("../../Detailed Course Notes/data/rand_hie.RData")
rand_hie_subset <- subset(rand_hie, plan_type %in% c("Catastrophic", "Free"))

reg_a <- lm(total_med_expenditure ~ plan_type, data=rand_hie_subset)
summary(reg_a)

## 
## Call:
## lm(formula = total_med_expenditure ~ plan_type, data = rand_hie_subset)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
##  -532.9  -392.8  -299.4    38.4 17987.6 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     392.77      40.19   9.773   <2e-16 ***
## plan_typeFree   140.12      49.74   2.817   0.0049 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 993.4 on 1758 degrees of freedom
## Multiple R-squared:  0.004493,   Adjusted R-squared:  0.003927 
## F-statistic: 7.935 on 1 and 1758 DF,  p-value: 0.004903

Relative to having only “catastrophic” insurance coverage, total medical expenditure (notice that total medical expenditure includes both how much the person paid themselves plus how much their insurance paid) is estimated to be substantially higher, on average, for individuals assigned to “free” insurance (i.e., that paid nothing for medical care); in particular, we estimate that “free” insurance results about $140 more, on average, than those with only catastrophic coverage. In my view, this difference is large in magnitude as the average expenditure is $393 for those with catastrophic coverage, which implies that those with free coverage have 36% higher total medical expenditures. Since individuals were randomly assigned to a type of plan, it seems reasonable to interpret these results as being a causal effect of plan type on total medical spending.

Part (b)

reg_b <- lm(face_to_face_visits ~ plan_type, data=rand_hie_subset)
summary(reg_b)

## 
## Call:
## lm(formula = face_to_face_visits ~ plan_type, data = rand_hie_subset)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.928 -3.192 -1.792  0.808 91.672 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     3.1917     0.2562  12.457  < 2e-16 ***
## plan_typeFree   1.7361     0.3171   5.475 5.01e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.333 on 1758 degrees of freedom
## Multiple R-squared:  0.01676,    Adjusted R-squared:  0.01621 
## F-statistic: 29.98 on 1 and 1758 DF,  p-value: 5.007e-08

These results are broadly similar to the ones before. Individuals assigned to the “free” insurance plan had, on average, 1.7 more face to face visits with doctors. This is 54% more than individuals randomly assigned to the “catastrophic” insurance plan. As in part (a), it seems reasonable to interpret these as causal effects due to the random assignment.

Part (c)

reg_c <- lm(health_index ~ plan_type, data=rand_hie_subset)
summary(reg_c)

## 
## Call:
## lm(formula = health_index ~ plan_type, data = rand_hie_subset)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -60.525  -9.784   1.516  10.616  32.216 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    68.5247     0.6190 110.698   <2e-16 ***
## plan_typeFree  -0.7407     0.7661  -0.967    0.334    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.3 on 1758 degrees of freedom
## Multiple R-squared:  0.0005315,  Adjusted R-squared:  -3.708e-05 
## F-statistic: 0.9348 on 1 and 1758 DF,  p-value: 0.3338

These results are different from the previous ones. Although individuals assigned to the “free” insurance plan appear to be utilizing more medical care, it does not appear to be improving their health (at least according to this measure of an individual’s health). The results here are not statistically significant and quantitatively small; for example, here we estimate that individuals in the “free” insurance plan have about 1% lower health index, on average, than those in the “catastrophic” plan.

Part (d)

Parts (a)-(c) seem to suggest that “free” insurance increased medical care usage without much of an effect on health (at least in the way that we were able to measure health).

Chapter 8, Extra Question 1

Treatment effect heterogeneity means that the effect of the treatment can be different across different units. Treatment effect homogeneity means that the effect of the treatment is the same across all units. Most applications (at least the ones we have considered) likely exhibit treatment effect heterogeneity.

Chapter 8, Extra Question 2

Most researchers give up on estimating individual-level treatment effects as it is too difficult. To give a particular example, suppose that we are interested in the causal effect of going to college on a person’s income. For me, I went to college, so I know my treated potential outcome. However, even I do not know exactly what my untreated potential outcome is — this means that I do not know my own treatment effect. Now compare that to a researcher which might just observe just a few of my characteristics. They have much less information with which to learn about my individual level treatment effect than I do; if I do not know my own treatment effect, it suggests that this is an essentially impossible task for a researcher.

Chapter 8, Extra Question 3

Unconfoundedness is the condition that

\[ \Big(Y(1),Y(0)\Big) \perp\!\!\!\perp D \, \Big| \, X \]

This says that potential outcomes are independent of the treatment after conditioning on covariates. In practice, it means that, if we find individuals with the same $X$ covariates and some of which participate in the treatment while others do not, we would be willing to interpret differences in their average outcomes as being causal effects of the treatment.

Chapter 8, Extra Question 6

Part (a)

It is probably not reasonable to interpret $\hat{\alpha}$ as an estimate of the causal effect of being in a union. There are probably a number of other things that we would need to be able to control for in order to interpret this as a causal effect; some examples are: age (I think union members tend to be older in the U.S. and that tends to be correlated with higher earnings), occupation/industry (union members tend to be concentrated in the manufacturing sector where there traditionally have been fairly large wage premiums), and perhaps other things like motivation and ability (it’s not totally clear if this is true, but it is at least worth entertaining the idea that either of these could be correlated with union membership and they are both very likely correlated with earnings).