Homework 4 Solutions

Ch.11, Coding Question 1

# a)
data(Caschool, package="Ecdat")
reg <- lm(testscr ~ str + avginc + elpct, data=Caschool)
summary(reg)


Call:
lm(formula = testscr ~ str + avginc + elpct, data = Caschool)

Residuals:
    Min      1Q  Median      3Q     Max 
-42.800  -6.862   0.275   6.586  31.199 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 640.31550    5.77489 110.879   <2e-16 ***
str          -0.06878    0.27691  -0.248    0.804    
avginc        1.49452    0.07483  19.971   <2e-16 ***
elpct        -0.48827    0.02928 -16.674   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 10.35 on 416 degrees of freedom
Multiple R-squared:  0.7072,    Adjusted R-squared:  0.7051 
F-statistic: 334.9 on 3 and 416 DF,  p-value: < 2.2e-16

avginc and elpct are statistically different from 0 while str is not statistically different 0. We can tell by comparing (the absolute value of) the t-statistics in the column labeled “t value” to to 1.96. The ones that are larger in magnitude are statistically different from 0.

# b)
mean(Caschool$testscr)

[1] 654.1565

The average test score in the data is a little over 654.

# c)
predict(reg, newdata=data.frame(str=20, avginc=30, elpct=10))

       1 
678.8928

The predicted value here is somewhat higher than the overall sample average from part (b).

# d)
predict(reg, newdata=data.frame(str=15, avginc=30, elpct=10))

       1 
679.2367

The predicted value here is almost the same (slightly bigger) than in part (c). The reason for this is that the estimated coefficient on str from the original regression is very small — this means that changing the student teacher ratio by 5 does not change the predicted value very much.

Ch.11, Coding Question 2

load("intergenerational_mobility.RData")

# a)
reg_a <- lm(child_fincome ~ parent_fincome, data=intergenerational_mobility)
summary(reg_a)


Call:
lm(formula = child_fincome ~ parent_fincome, data = intergenerational_mobility)

Residuals:
    Min      1Q  Median      3Q     Max 
-236861  -26267   -7871   15336  879913 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    3.491e+04  1.782e+03   19.58   <2e-16 ***
parent_fincome 5.471e-01  2.430e-02   22.51   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 50120 on 3628 degrees of freedom
Multiple R-squared:  0.1226,    Adjusted R-squared:  0.1223 
F-statistic: 506.8 on 1 and 3628 DF,  p-value: < 2.2e-16

The estimated coefficient is 0.547. It indicates that, on average, children from families with one dollar higher income earned 0.547 dollars more.

# b)
reg_b <- lm(log(child_fincome) ~ parent_fincome, data=intergenerational_mobility)
summary(reg_b)


Call:
lm(formula = log(child_fincome) ~ parent_fincome, data = intergenerational_mobility)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5078 -0.3467  0.0630  0.3851  2.8567 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    1.047e+01  2.170e-02  482.60   <2e-16 ***
parent_fincome 7.436e-06  2.959e-07   25.13   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6102 on 3628 degrees of freedom
Multiple R-squared:  0.1483,    Adjusted R-squared:  0.1481 
F-statistic: 631.7 on 1 and 3628 DF,  p-value: < 2.2e-16

The estimated coefficient is 0.0000744. You should multiply this by 100 to get a percentage change interpretation. Thus, our estimate is that, on average, children from families with 1 dollar more income had 0.00744% higher income. Alternatively, you could multiply again by 1000 to say that, on average, children from families with 1000 dollar more income earned 7.44% more.

# c)
reg_c <- lm(child_fincome ~ log(parent_fincome), data=intergenerational_mobility)
summary(reg_c)


Call:
lm(formula = child_fincome ~ log(parent_fincome), data = intergenerational_mobility)

Residuals:
    Min      1Q  Median      3Q     Max 
-118880  -26819   -7241   15345  897028 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)          -367523      18574  -19.79   <2e-16 ***
log(parent_fincome)    39942       1692   23.60   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 49820 on 3628 degrees of freedom
Multiple R-squared:  0.1331,    Adjusted R-squared:  0.1329 
F-statistic:   557 on 1 and 3628 DF,  p-value: < 2.2e-16

The estimated coefficient is 39,942. Since this is level-log regression, you should divide the coefficient by 100. Thus, we estimate that, on average, children from families with 1% higher income earned $399.42 more.

# d)
reg_d <- lm(log(child_fincome) ~ log(parent_fincome), data=intergenerational_mobility)
summary(reg_d)


Call:
lm(formula = log(child_fincome) ~ log(parent_fincome), data = intergenerational_mobility)

Residuals:
     Min       1Q   Median       3Q      Max 
-3.09788 -0.33439  0.05167  0.37722  2.80563 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)          4.28207    0.22014   19.45   <2e-16 ***
log(parent_fincome)  0.60861    0.02006   30.34   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5905 on 3628 degrees of freedom
Multiple R-squared:  0.2024,    Adjusted R-squared:  0.2022 
F-statistic: 920.6 on 1 and 3628 DF,  p-value: < 2.2e-16

The estimated coefficient on parents’ income is 0.609. This indicates that, on average, children from families with 1% higher income earned 0.609% more.

```

Ch.11, Extra Question 1

On average, people with one more year of education earn $\beta_1$ more.

Ch.11, Extra Question 2

On average people with one more year of education earn $\beta_1$ more, holding experience and sex constant.

Ch.11, Extra Question 3

You can run the following regression

\[Earnings = \beta_0 + \beta_1 Education + \beta_2 Female + \beta_3 Education \times Female + U\]

and test whether $\beta_3=0$ (if it is different from 0, that would indicate that ``return’’ to education is different for women relative to men)
You can estimate the following regression

\[Earnings = \beta_0 + \beta_1 Education + \beta_2 Female + \beta_3 Education \times Female + \beta_4 Experience + U\]

and continue to test whether $\beta_3=0$.

Ch.11, Extra Question 4

On average, people with one more year of education earn $100 \beta_1$ percent more, holding experience and sex constant.