4.14 Coding Questions
For this problem, we will use the data
intergenerational_mobility
.Run a regression of child family income (\(child\_fincome\)) on parents’ family income (\(parent\_fincome\)). How should you interpret the estimated coefficient on parents’ family income? What is the p-value for the coefficient on parents’ family income?
Run a regression of \(\log(child\_fincome)\) on \(parent\_fincome\). How should you interpret the estimated cofficient on \(parent\_fincome\)?
Run a regression of \(child\_fincome\) on \(\log(parent\_fincome)\). How should you interpret the estimated coefficient on \(\log(parent\_fincome)\)?
Run a regression of \(\log(child\_fincome)\) on \(\log(parent\_fincome)\). How should you interpret the estimated coefficient on \(\log(parent\_fincome)\)?
For this question, we’ll use the
fertilizer_2000
data.Run a regression of \(\log(avyield)\) on \(\log(avfert)\). How do you interpret the estimated coefficient on \(\log(avfert)\)?
Now suppose that you additionally want to control for precipitation and the region that a country is located in. How would you do this? Estimate the model that you propose here, report the results, and interpret the coefficient on \(\log(avfert)\).
Now suppose that you are interested in whether the effect of fertilizer varies by region that a country is located in (while still controlling for the same covariates as in part (b)). Propose a model that can be used for this purpose. Estimate the model that you proposed, report the results, and discuss whether the effect of fertilizer appears to vary by region or not.
For this question, we will use the data
mutual_funds
. We’ll be interested in whether mutual funds that have higher expense ratios (these are typically actively managed funds) have higher returns relative to mutual funds that have lower expense ratios (e.g., index funds). For this problem, we will use the variablesfund_return_3years
,investment_type
,risk_rating
,size_type
,fund_net_annual_expense_ratio
,asset_cash
,asset_stocks
,asset_bonds
.Calculate the median
fund_net_annual_expense_ratio
.Use the
datasummary_balance
function from themodelsummary
package to report summary statistics forfund_return_3year
,fund_net_annual_expense_ratio
,risk_rating
,asset_cash
,asset_stocks
,asset_bonds
based on whether their expense ratio is above or below the median. Do you notice any interesting patterns?Run a regression of
fund_return_3years
onfund_net_annual_expense_ratio
. How do you interpret the results?Now, additionally control for
investment_type
,risk_rating
, andsize_type
Hint: think carefully about what type of variables each of these are and how they should enter the model. How do these results compare to the ones from part c?Now, add the variables
assets_cash
,assets_stocks
, andassets_bonds
to the model from part d. How do you interpret these results? Compare and interpret the differences between parts c, d, and e.
For this question, we’ll use the data
Lead_Mortality
to study the effect of lead pipes on infant mortality in 1900.Run a regression of infant mortality (
infrate
) on whether or not a city had lead pipes (lead
) and interpret/discuss the results.It turns out that the amount of lead in drinking water depends on how acidic the water is, with more acidic water leaching more of the lead (so that there is more exposure to lead with more acidic water). To measure acidity, we’ll use the pH of the water in a particular city (
ph
); recall that, a lower value of pH indicates higher acidity. Run a regression of infant mortality on whether or not a city has lead pipes, the pH of its water, and the interaction between having lead pipes and pH. Report your results. What is the estimated partial effect of having lead pipes from this model?Given the results in part b, calculate an estimate of the average partial effect of having lead pipes on infant mortality.
Given the results in part b, how much does the partial effect of having lead pipes differ for cities that have a pH of 6.5 relative to a pH of 7.5?