Homework 4 Solutions

Ch.11, Coding Question 1

# a)
data(Caschool, package="Ecdat")
reg <- lm(testscr ~ str + avginc + elpct, data=Caschool)
summary(reg)

Call:
lm(formula = testscr ~ str + avginc + elpct, data = Caschool)

Residuals:
    Min      1Q  Median      3Q     Max 
-42.800  -6.862   0.275   6.586  31.199 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 640.31550    5.77489 110.879   <2e-16 ***
str          -0.06878    0.27691  -0.248    0.804    
avginc        1.49452    0.07483  19.971   <2e-16 ***
elpct        -0.48827    0.02928 -16.674   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 10.35 on 416 degrees of freedom
Multiple R-squared:  0.7072,    Adjusted R-squared:  0.7051 
F-statistic: 334.9 on 3 and 416 DF,  p-value: < 2.2e-16

avginc and elpct are statistically different from 0 while str is not statistically different 0. We can tell by comparing (the absolute value of) the t-statistics in the column labeled “t value” to to 1.96. The ones that are larger in magnitude are statistically different from 0.

# b)
mean(Caschool$testscr)
[1] 654.1565

The average test score in the data is a little over 654.

# c)
predict(reg, newdata=data.frame(str=20, avginc=30, elpct=10))
       1 
678.8928 

The predicted value here is somewhat higher than the overall sample average from part (b).

# d)
predict(reg, newdata=data.frame(str=15, avginc=30, elpct=10))
       1 
679.2367 

The predicted value here is almost the same (slightly bigger) than in part (c). The reason for this is that the estimated coefficient on str from the original regression is very small — this means that changing the student teacher ratio by 5 does not change the predicted value very much.

Ch.11, Coding Question 2

load("intergenerational_mobility.RData")

# a)
reg_a <- lm(child_fincome ~ parent_fincome, data=intergenerational_mobility)
summary(reg_a)

Call:
lm(formula = child_fincome ~ parent_fincome, data = intergenerational_mobility)

Residuals:
    Min      1Q  Median      3Q     Max 
-236861  -26267   -7871   15336  879913 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    3.491e+04  1.782e+03   19.58   <2e-16 ***
parent_fincome 5.471e-01  2.430e-02   22.51   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 50120 on 3628 degrees of freedom
Multiple R-squared:  0.1226,    Adjusted R-squared:  0.1223 
F-statistic: 506.8 on 1 and 3628 DF,  p-value: < 2.2e-16

The estimated coefficient is 0.547. It indicates that, on average, when parents’ income increases by 1 dollar, children’s income increases by 0.547 dollars.

# b)
reg_b <- lm(log(child_fincome) ~ parent_fincome, data=intergenerational_mobility)
summary(reg_b)

Call:
lm(formula = log(child_fincome) ~ parent_fincome, data = intergenerational_mobility)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5078 -0.3467  0.0630  0.3851  2.8567 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    1.047e+01  2.170e-02  482.60   <2e-16 ***
parent_fincome 7.436e-06  2.959e-07   25.13   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6102 on 3628 degrees of freedom
Multiple R-squared:  0.1483,    Adjusted R-squared:  0.1481 
F-statistic: 631.7 on 1 and 3628 DF,  p-value: < 2.2e-16

The estimated coefficient is 0.0000744. You should multiply this by 100 to get the percentage change in children’s income for a 1 dollar increase in parents’ income. Thus, our estimate is that, on average, a 1 dollar increase in parents’ income is associated with a 0.00744% increase in children’s income. Alternatively, you could multiply again by 1000 to say that, on average, a 1000 dollar increase in parents’ income is associated with a 7.44% increase in children’s income.

# c)
reg_c <- lm(child_fincome ~ log(parent_fincome), data=intergenerational_mobility)
summary(reg_c)

Call:
lm(formula = child_fincome ~ log(parent_fincome), data = intergenerational_mobility)

Residuals:
    Min      1Q  Median      3Q     Max 
-118880  -26819   -7241   15345  897028 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)          -367523      18574  -19.79   <2e-16 ***
log(parent_fincome)    39942       1692   23.60   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 49820 on 3628 degrees of freedom
Multiple R-squared:  0.1331,    Adjusted R-squared:  0.1329 
F-statistic:   557 on 1 and 3628 DF,  p-value: < 2.2e-16

The estimated coefficient is 39,942. Since this is level-log regression, you should divide the coefficient by 100. Thus, we estimate that, on average, a 1% increase in parents’ income is associated with a 399.42 dollar increase in children’s income.

# d)
reg_d <- lm(log(child_fincome) ~ log(parent_fincome), data=intergenerational_mobility)
summary(reg_d)

Call:
lm(formula = log(child_fincome) ~ log(parent_fincome), data = intergenerational_mobility)

Residuals:
     Min       1Q   Median       3Q      Max 
-3.09788 -0.33439  0.05167  0.37722  2.80563 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)          4.28207    0.22014   19.45   <2e-16 ***
log(parent_fincome)  0.60861    0.02006   30.34   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5905 on 3628 degrees of freedom
Multiple R-squared:  0.2024,    Adjusted R-squared:  0.2022 
F-statistic: 920.6 on 1 and 3628 DF,  p-value: < 2.2e-16

The estimated coefficient on parents’ income is 0.609. This indicates, that on average, a 1% increase in parents’ income is associated with a 0.609% increase in children’s income.

Ch.11, Extra Question 1

\(\beta_1\) is how much earnings increase on average when years of education increases by 1 year.

Ch.11, Extra Question 2

\(\beta_1\) is how much earnings increase on average when years of education increases by 1 holding experience and gender constant.

Ch.11, Extra Question 3

  1. You can run the following regression

    \[Earnings = \beta_0 + \beta_1 Education + \beta_2 Female + \beta_3 Education \times Female + U\]

    and test whether \(\beta_3=0\) (if it is different from 0, that would indicate that the return to education for women relative to men)

  2. You can estimate the following regression

    \[Earnings = \beta_0 + \beta_1 Education + \beta_2 Female + \beta_3 Education \times Female + \beta_4 Experience + U\]

    and continue to test whether \(\beta_3=0\).

Ch.11, Extra Question 4

\(100 \beta_1\) is the average percentage change in earnings when education increases by 1 holding experience and sex constant.