Ch. 5, Coding Question 3

load("../../Detailed Course Notes/data/fertilizer_2000.RData")
library(modelsummary)

# regression for part a
reg_a <- lm(log(avyield) ~ log(avfert), data=fertilizer_2000)

# regression for part b
reg_b <- lm(log(avyield) ~ log(avfert) + prec, data=fertilizer_2000)

# regression for part c
reg_c <- lm(log(avyield) ~ log(avfert)*as.factor(region_wb) + prec, data=fertilizer_2000)

# print results for all parts
modelsummary(list(reg_a, reg_b, reg_c))
 (1)   (2)   (3)
(Intercept) 1.474 1.228 1.434
(0.121) (0.166) (0.433)
log(avfert) 0.256 0.237 0.287
(0.031) (0.031) (0.221)
prec 0.000 0.000
(0.000) (0.000)
as.factor(region_wb)EECA −1.369
(2.165)
as.factor(region_wb)LAC −0.461
(0.487)
as.factor(region_wb)MENA 0.707
(0.671)
as.factor(region_wb)SA −0.230
(0.787)
as.factor(region_wb)SSA −0.436
(0.504)
log(avfert) × as.factor(region_wb)EECA −0.542
(0.744)
log(avfert) × as.factor(region_wb)LAC −0.159
(0.244)
log(avfert) × as.factor(region_wb)MENA 0.317
(0.283)
log(avfert) × as.factor(region_wb)SA −0.089
(0.375)
log(avfert) × as.factor(region_wb)SSA −0.087
(0.227)
Num.Obs. 68 68 68
R2 0.511 0.543 0.605
R2 Adj. 0.504 0.529 0.518
AIC 176.6 174.0 184.1
BIC 183.2 182.9 215.2
Log.Lik. −45.205 −42.926 −37.987
F 69.027 38.600 7.011
RMSE 0.47 0.45 0.42
  1. We estimate that a 1% increase in fertilizer usage increases crop yield by .256% on average (this is strongly statistically significant from 0).

  2. When we additionally control for precipitation, we estimate that a 1% increase in fertilizer usage increases crop yield by .237% on average holding precipitation constant (this is strongly statistically significant from 0).

  3. In the last specification, we include control variables for a country’s region as well as interaction terms between region and log(fertilizer). Since none of these interaction terms are statistically different from 0, we do not have strong evidence that effects of fertilizer are different across regions. Importantly, this does not necessarily mean that these effects are all the same; some of the differences appear large in magnitude, but we only have 68 observations total here, so it might just be the case that we do not have enough data to reliably detect differences across regions.

Ch. 5, Coding Question 4

# load the data
load("../../Detailed Course Notes/data/mutual_funds.RData")
  1. median fund_net_annual_expense_ratio

    median(mutual_funds$fund_net_annual_expense_ratio)
    ## [1] 0.98
  2. med_er <- median(mutual_funds$fund_net_annual_expense_ratio)
    mutual_funds$high_expense_ratio <- 1*(mutual_funds$fund_net_annual_expense_ratio > med_er)
    library(modelsummary)
    library(dplyr)
    datasummary_balance(~high_expense_ratio,
                        data=select(mutual_funds,
                                    high_expense_ratio,
                                    fund_return_3years,
                                    fund_net_annual_expense_ratio,
                                    risk_rating,
                                    asset_cash,
                                    asset_stocks,
                                    asset_bonds))
    0
    1

    Mean

    Std. Dev.

    Mean

    Std. Dev.

    Diff. in Means

    Std. Error

    fund_return_3years

    5.1

    5.6

    4.0

    7.3

    -1.1

    0.1

    fund_net_annual_expense_ratio

    0.7

    0.2

    1.5

    1.1

    0.8

    0.0

    risk_rating

    3.1

    1.1

    3.1

    1.1

    0.0

    0.0

    asset_cash

    5.2

    8.8

    6.3

    11.3

    1.1

    0.2

    asset_stocks

    51.6

    44.8

    69.9

    39.2

    18.3

    0.6

    asset_bonds

    40.8

    42.3

    21.0

    34.1

    -19.8

    0.6

    Yes, there are interesting patterns here. Mutual funds with high expense ratios appear to have lower returns, about the same amount of risk, and invest more in bonds particularly relative to stocks.

  3. reg_c <- lm(fund_return_3years ~ fund_net_annual_expense_ratio, data=mutual_funds)
    summary(reg_c)
    ## 
    ## Call:
    ## lm(formula = fund_return_3years ~ fund_net_annual_expense_ratio, 
    ##     data = mutual_funds)
    ## 
    ## Residuals:
    ##     Min      1Q  Median      3Q     Max 
    ## -62.582  -2.903  -0.881   1.751  52.559 
    ## 
    ## Coefficients:
    ##                               Estimate Std. Error t value Pr(>|t|)    
    ## (Intercept)                    4.99470    0.07713  64.759  < 2e-16 ***
    ## fund_net_annual_expense_ratio -0.40389    0.05485  -7.363 1.88e-13 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## Residual standard error: 6.505 on 17005 degrees of freedom
    ## Multiple R-squared:  0.003178,   Adjusted R-squared:  0.003119 
    ## F-statistic: 54.21 on 1 and 17005 DF,  p-value: 1.882e-13

    This indicates that for every one unit increase in the expense ratio (which, importantly, is a big increase), on average the 3 year return of a mutual fund decreases by 0.4, and the effect is statistically significant. The average 3 year return in our data is 4.56, so a decrease in the return of 0.4 seems like a medium-sized decrease to me.

  4. reg_d <- lm(fund_return_3years ~ fund_net_annual_expense_ratio + as.factor(investment_type) + 
                  risk_rating + as.factor(size_type), data=mutual_funds)
    summary(reg_d)
    ## 
    ## Call:
    ## lm(formula = fund_return_3years ~ fund_net_annual_expense_ratio + 
    ##     as.factor(investment_type) + risk_rating + as.factor(size_type), 
    ##     data = mutual_funds)
    ## 
    ## Residuals:
    ##     Min      1Q  Median      3Q     Max 
    ## -57.563  -2.568   0.231   2.661  67.875 
    ## 
    ## Coefficients: (1 not defined because of singularities)
    ##                                  Estimate Std. Error t value Pr(>|t|)    
    ## (Intercept)                       3.31416    0.25170  13.167  < 2e-16 ***
    ## fund_net_annual_expense_ratio    -0.62511    0.04306 -14.517  < 2e-16 ***
    ## as.factor(investment_type)Blend  -0.87405    0.25019  -3.494 0.000478 ***
    ## as.factor(investment_type)Growth  6.49527    0.25276  25.698  < 2e-16 ***
    ## as.factor(investment_type)Value  -3.88093    0.24809 -15.643  < 2e-16 ***
    ## risk_rating                      -0.13454    0.03658  -3.678 0.000236 ***
    ## as.factor(size_type)Large         3.45227    0.11632  29.679  < 2e-16 ***
    ## as.factor(size_type)Medium        1.64511    0.12699  12.955  < 2e-16 ***
    ## as.factor(size_type)Small              NA         NA      NA       NA    
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## Residual standard error: 5.09 on 16999 degrees of freedom
    ## Multiple R-squared:   0.39,  Adjusted R-squared:  0.3897 
    ## F-statistic:  1552 on 7 and 16999 DF,  p-value: < 2.2e-16

    We estimate that, on average, when the expense ratio increases by 1, the three year return of mutual funds decreases by 0.63 holding investment type, risk rating, and size of the mutual fund constant. Thus, if anything, controlling for investment type, risk rating, and the size of the mutual fund seems to make the average effect of expense ratio on the 3 year return even more negative relative to the case when we did not control for them.

  5. reg_e <- lm(fund_return_3years ~ fund_net_annual_expense_ratio + as.factor(investment_type) + 
                  risk_rating + as.factor(size_type) + asset_cash + asset_stocks + asset_bonds,
                data=mutual_funds)
    summary(reg_e)
    ## 
    ## Call:
    ## lm(formula = fund_return_3years ~ fund_net_annual_expense_ratio + 
    ##     as.factor(investment_type) + risk_rating + as.factor(size_type) + 
    ##     asset_cash + asset_stocks + asset_bonds, data = mutual_funds)
    ## 
    ## Residuals:
    ##     Min      1Q  Median      3Q     Max 
    ## -56.395  -2.414   0.112   2.556  59.678 
    ## 
    ## Coefficients: (1 not defined because of singularities)
    ##                                   Estimate Std. Error t value Pr(>|t|)    
    ## (Intercept)                       5.884983   0.508700  11.569  < 2e-16 ***
    ## fund_net_annual_expense_ratio    -0.534839   0.043509 -12.293  < 2e-16 ***
    ## as.factor(investment_type)Blend  -0.832453   0.256546  -3.245 0.001177 ** 
    ## as.factor(investment_type)Growth  6.664208   0.264516  25.194  < 2e-16 ***
    ## as.factor(investment_type)Value  -3.855590   0.254010 -15.179  < 2e-16 ***
    ## risk_rating                      -0.157309   0.036381  -4.324 1.54e-05 ***
    ## as.factor(size_type)Large         3.712324   0.118456  31.339  < 2e-16 ***
    ## as.factor(size_type)Medium        1.533386   0.126621  12.110  < 2e-16 ***
    ## as.factor(size_type)Small               NA         NA      NA       NA    
    ## asset_cash                       -0.070706   0.006642 -10.645  < 2e-16 ***
    ## asset_stocks                     -0.029536   0.004680  -6.312 2.83e-10 ***
    ## asset_bonds                      -0.018574   0.004787  -3.880 0.000105 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## Residual standard error: 5.058 on 16996 degrees of freedom
    ## Multiple R-squared:  0.3977, Adjusted R-squared:  0.3974 
    ## F-statistic:  1122 on 10 and 16996 DF,  p-value: < 2.2e-16

    These results are very similar to the previous two sets of results. We estimate that, on average, when the expense ratio increases by 1, the three year return of mutual funds decreases by -0.53 holding investment type, risk rating, size of the mutual fund, and percentage of assets in cash, stocks, and bonds constant.

    From the summary statistics, we saw that mutual funds with higher expense ratios tended to have lower returns than mutual funds with lower expense ratios. If we interpret higher expense ratios as being a proxy for active management of the mutual fund, it suggests that actively managed funds tend to have lower returns. One possible explanation for this would be that passive mutual funds might tend to make different types of investment (and, e.g., it seems possible that passive stock index funds might involve higher risk). However, even when we controlled for a number of variables that are likely related to how risky the investments for a particular index fund were, we still estimated that mutual funds with higher expense ratios still tended to have lower returns.

Ch.5, Extra Question 1

\(\beta_1\) is how much earnings increase on average when years of education increases by 1 year.

Ch.5, Extra Question 2

\(\beta_1\) is how much earnings increase on average when years of education increases by 1 holding experience and gender constant.

Ch.5, Extra Question 3

  1. You can run the following regression

    \[Earnings = \beta_0 + \beta_1 Education + \beta_2 Female + \beta_3 Education \times Female + U\]

    and test whether \(\beta_3=0\) (if it is different from 0, that would indicate that the return to education for women relative to men)

  2. You can estimate the following regression

    \[Earnings = \beta_0 + \beta_1 Education + \beta_2 Female + \beta_3 Education \times Female + \beta_4 Experience + U\]

    and continue to test whether \(\beta_3=0\).

Ch.5, Extra Question 4

\(100 \beta_1\) is the average percentage change in earnings when education increases by 1 holding experience and sex constant.

Ch.6, Extra Question 1

\(R^2\) is a measure of the in-sample fit of a regression. If we are interested in choosing a model that will predict well out-of-simple, ranking different models by in-sample fit may not be appropriate. Second, \(R^2\) will always be larger for more complicated models relative to simpler models. This can lead to overfitting and poor out-of-sample predictions.

Ch.6, Extra Question 2

For AIC and BIC, the “penalty”/“cost” terms tend to increase these quantities while the “benefit” of adding a regressor comes from decreasing \(SSR\) (and, hence, decreasing the value of AIC and/or BIC). This means that models that do well according to these criteria will have low values of AIC/BIC and models that do poorly will have high values of AIC/BIC; therefore, we choose the model that minimizes AIC/BIC.