```
load("../../Detailed Course Notes/data/fertilizer_2000.RData")
library(modelsummary)
# regression for part a
reg_a <- lm(log(avyield) ~ log(avfert), data=fertilizer_2000)
# regression for part b
reg_b <- lm(log(avyield) ~ log(avfert) + prec, data=fertilizer_2000)
# regression for part c
reg_c <- lm(log(avyield) ~ log(avfert)*as.factor(region_wb) + prec, data=fertilizer_2000)
# print results for all parts
modelsummary(list(reg_a, reg_b, reg_c))
```

(1) | (2) | (3) | |
---|---|---|---|

(Intercept) | 1.474 | 1.228 | 1.434 |

(0.121) | (0.166) | (0.433) | |

log(avfert) | 0.256 | 0.237 | 0.287 |

(0.031) | (0.031) | (0.221) | |

prec | 0.000 | 0.000 | |

(0.000) | (0.000) | ||

as.factor(region_wb)EECA | −1.369 | ||

(2.165) | |||

as.factor(region_wb)LAC | −0.461 | ||

(0.487) | |||

as.factor(region_wb)MENA | 0.707 | ||

(0.671) | |||

as.factor(region_wb)SA | −0.230 | ||

(0.787) | |||

as.factor(region_wb)SSA | −0.436 | ||

(0.504) | |||

log(avfert) × as.factor(region_wb)EECA | −0.542 | ||

(0.744) | |||

log(avfert) × as.factor(region_wb)LAC | −0.159 | ||

(0.244) | |||

log(avfert) × as.factor(region_wb)MENA | 0.317 | ||

(0.283) | |||

log(avfert) × as.factor(region_wb)SA | −0.089 | ||

(0.375) | |||

log(avfert) × as.factor(region_wb)SSA | −0.087 | ||

(0.227) | |||

Num.Obs. | 68 | 68 | 68 |

R2 | 0.511 | 0.543 | 0.605 |

R2 Adj. | 0.504 | 0.529 | 0.518 |

AIC | 176.6 | 174.0 | 184.1 |

BIC | 183.2 | 182.9 | 215.2 |

Log.Lik. | −45.205 | −42.926 | −37.987 |

F | 69.027 | 38.600 | 7.011 |

RMSE | 0.47 | 0.45 | 0.42 |

We estimate that a 1% increase in fertilizer usage increases crop yield by .256% on average (this is strongly statistically significant from 0).

When we additionally control for precipitation, we estimate that a 1% increase in fertilizer usage increases crop yield by .237% on average holding precipitation constant (this is strongly statistically significant from 0).

In the last specification, we include control variables for a country’s region as well as interaction terms between region and

`log(fertilizer)`

. Since none of these interaction terms are statistically different from 0, we do not have strong evidence that effects of fertilizer are different across regions. Importantly, this does not necessarily mean that these effects are all the same; some of the differences appear large in magnitude, but we only have 68 observations total here, so it might just be the case that we do not have enough data to reliably detect differences across regions.

```
# load the data
load("../../Detailed Course Notes/data/mutual_funds.RData")
```

median

`fund_net_annual_expense_ratio`

`median(mutual_funds$fund_net_annual_expense_ratio)`

`## [1] 0.98`

`med_er <- median(mutual_funds$fund_net_annual_expense_ratio) mutual_funds$high_expense_ratio <- 1*(mutual_funds$fund_net_annual_expense_ratio > med_er)`

`library(modelsummary) library(dplyr) datasummary_balance(~high_expense_ratio, data=select(mutual_funds, high_expense_ratio, fund_return_3years, fund_net_annual_expense_ratio, risk_rating, asset_cash, asset_stocks, asset_bonds))`

Mean

Std. Dev.

Mean

Std. Dev.

Diff. in Means

Std. Error

fund_return_3years

5.1

5.6

4.0

7.3

-1.1

0.1

fund_net_annual_expense_ratio

0.7

0.2

1.5

1.1

0.8

0.0

risk_rating

3.1

1.1

3.1

1.1

0.0

0.0

asset_cash

5.2

8.8

6.3

11.3

1.1

0.2

asset_stocks

51.6

44.8

69.9

39.2

18.3

0.6

asset_bonds

40.8

42.3

21.0

34.1

-19.8

0.6

Yes, there are interesting patterns here. Mutual funds with high expense ratios appear to have lower returns, about the same amount of risk, and invest more in bonds particularly relative to stocks.

`reg_c <- lm(fund_return_3years ~ fund_net_annual_expense_ratio, data=mutual_funds) summary(reg_c)`

`## ## Call: ## lm(formula = fund_return_3years ~ fund_net_annual_expense_ratio, ## data = mutual_funds) ## ## Residuals: ## Min 1Q Median 3Q Max ## -62.582 -2.903 -0.881 1.751 52.559 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.99470 0.07713 64.759 < 2e-16 *** ## fund_net_annual_expense_ratio -0.40389 0.05485 -7.363 1.88e-13 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 6.505 on 17005 degrees of freedom ## Multiple R-squared: 0.003178, Adjusted R-squared: 0.003119 ## F-statistic: 54.21 on 1 and 17005 DF, p-value: 1.882e-13`

This indicates that for every one unit increase in the expense ratio (which, importantly, is a big increase), on average the 3 year return of a mutual fund decreases by 0.4, and the effect is statistically significant. The average 3 year return in our data is 4.56, so a decrease in the return of 0.4 seems like a medium-sized decrease to me.

`reg_d <- lm(fund_return_3years ~ fund_net_annual_expense_ratio + as.factor(investment_type) + risk_rating + as.factor(size_type), data=mutual_funds) summary(reg_d)`

`## ## Call: ## lm(formula = fund_return_3years ~ fund_net_annual_expense_ratio + ## as.factor(investment_type) + risk_rating + as.factor(size_type), ## data = mutual_funds) ## ## Residuals: ## Min 1Q Median 3Q Max ## -57.563 -2.568 0.231 2.661 67.875 ## ## Coefficients: (1 not defined because of singularities) ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.31416 0.25170 13.167 < 2e-16 *** ## fund_net_annual_expense_ratio -0.62511 0.04306 -14.517 < 2e-16 *** ## as.factor(investment_type)Blend -0.87405 0.25019 -3.494 0.000478 *** ## as.factor(investment_type)Growth 6.49527 0.25276 25.698 < 2e-16 *** ## as.factor(investment_type)Value -3.88093 0.24809 -15.643 < 2e-16 *** ## risk_rating -0.13454 0.03658 -3.678 0.000236 *** ## as.factor(size_type)Large 3.45227 0.11632 29.679 < 2e-16 *** ## as.factor(size_type)Medium 1.64511 0.12699 12.955 < 2e-16 *** ## as.factor(size_type)Small NA NA NA NA ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 5.09 on 16999 degrees of freedom ## Multiple R-squared: 0.39, Adjusted R-squared: 0.3897 ## F-statistic: 1552 on 7 and 16999 DF, p-value: < 2.2e-16`

We estimate that, on average, when the expense ratio increases by 1, the three year return of mutual funds decreases by 0.63 holding investment type, risk rating, and size of the mutual fund constant. Thus, if anything, controlling for investment type, risk rating, and the size of the mutual fund seems to make the average effect of expense ratio on the 3 year return even more negative relative to the case when we did not control for them.

`reg_e <- lm(fund_return_3years ~ fund_net_annual_expense_ratio + as.factor(investment_type) + risk_rating + as.factor(size_type) + asset_cash + asset_stocks + asset_bonds, data=mutual_funds) summary(reg_e)`

`## ## Call: ## lm(formula = fund_return_3years ~ fund_net_annual_expense_ratio + ## as.factor(investment_type) + risk_rating + as.factor(size_type) + ## asset_cash + asset_stocks + asset_bonds, data = mutual_funds) ## ## Residuals: ## Min 1Q Median 3Q Max ## -56.395 -2.414 0.112 2.556 59.678 ## ## Coefficients: (1 not defined because of singularities) ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 5.884983 0.508700 11.569 < 2e-16 *** ## fund_net_annual_expense_ratio -0.534839 0.043509 -12.293 < 2e-16 *** ## as.factor(investment_type)Blend -0.832453 0.256546 -3.245 0.001177 ** ## as.factor(investment_type)Growth 6.664208 0.264516 25.194 < 2e-16 *** ## as.factor(investment_type)Value -3.855590 0.254010 -15.179 < 2e-16 *** ## risk_rating -0.157309 0.036381 -4.324 1.54e-05 *** ## as.factor(size_type)Large 3.712324 0.118456 31.339 < 2e-16 *** ## as.factor(size_type)Medium 1.533386 0.126621 12.110 < 2e-16 *** ## as.factor(size_type)Small NA NA NA NA ## asset_cash -0.070706 0.006642 -10.645 < 2e-16 *** ## asset_stocks -0.029536 0.004680 -6.312 2.83e-10 *** ## asset_bonds -0.018574 0.004787 -3.880 0.000105 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 5.058 on 16996 degrees of freedom ## Multiple R-squared: 0.3977, Adjusted R-squared: 0.3974 ## F-statistic: 1122 on 10 and 16996 DF, p-value: < 2.2e-16`

These results are very similar to the previous two sets of results. We estimate that, on average, when the expense ratio increases by 1, the three year return of mutual funds decreases by -0.53 holding investment type, risk rating, size of the mutual fund, and percentage of assets in cash, stocks, and bonds constant.

From the summary statistics, we saw that mutual funds with higher expense ratios tended to have lower returns than mutual funds with lower expense ratios. If we interpret higher expense ratios as being a proxy for active management of the mutual fund, it suggests that actively managed funds tend to have lower returns. One possible explanation for this would be that passive mutual funds might tend to make different types of investment (and, e.g., it seems possible that passive stock index funds might involve higher risk). However, even when we controlled for a number of variables that are likely related to how risky the investments for a particular index fund were, we still estimated that mutual funds with higher expense ratios still tended to have lower returns.

\(\beta_1\) is how much earnings increase on average when years of education increases by 1 year.

\(\beta_1\) is how much earnings increase on average when years of education increases by 1 holding experience and gender constant.

You can run the following regression

\[Earnings = \beta_0 + \beta_1 Education + \beta_2 Female + \beta_3 Education \times Female + U\]

and test whether \(\beta_3=0\) (if it is different from 0, that would indicate that the return to education for women relative to men)

You can estimate the following regression

\[Earnings = \beta_0 + \beta_1 Education + \beta_2 Female + \beta_3 Education \times Female + \beta_4 Experience + U\]

and continue to test whether \(\beta_3=0\).

\(100 \beta_1\) is the average percentage change in earnings when education increases by 1 holding experience and sex constant.

\(R^2\) is a measure of the in-sample fit of a regression. If we are interested in choosing a model that will predict well out-of-simple, ranking different models by in-sample fit may not be appropriate. Second, \(R^2\) will always be larger for more complicated models relative to simpler models. This can lead to overfitting and poor out-of-sample predictions.

For AIC and BIC, the “penalty”/“cost” terms tend to increase these quantities while the “benefit” of adding a regressor comes from decreasing \(SSR\) (and, hence, decreasing the value of AIC and/or BIC). This means that models that do well according to these criteria will have low values of AIC/BIC and models that do poorly will have high values of AIC/BIC; therefore, we choose the model that minimizes AIC/BIC.