Consider the following regression where airq
is an indicator of air quality (lower is better) for a particular metropolitan area in California, dens1000
is the number of 1000s of people per square mile, coas
indicates whether or not the metro area is on the coast, and medi1000
is the median income in the metro area (in thousands of dollars).
data("Airq", package="Ecdat")
library(modelsummary)
Airq$coas <- 1*(Airq$coas=="yes")
Airq$dens1000 <- Airq$dens/1000
Airq$medi1000 <- Airq$medi/1000
reg1 <- lm(airq ~ dens1000 + coas + dens1000*coas + medi1000, data=Airq)
modelsummary(reg1, fmt=1, gof_omit=".")
Model 1 |
|
---|---|
(Intercept) |
120.6 |
(9.5) |
|
dens1000 |
−0.3 |
(2.8) |
|
coas |
−31.2 |
(11.3) |
|
medi1000 |
0.8 |
(0.4) |
|
dens1000 × coas |
−1.2 |
(3.4) |
Which regressors are statistically significant in this regression?
What is the predicted value for the air quality index for a metro area with 1000 people per square mile, that is not located on the coast, and with median income equal to $50,000?
Consider the following regression, where child_fincome
is child’s family income, parent_fincome
is parents’ family income, sex
is binary variable indicating whether a child is male, yearborn
is the year that the child was born in, and education
is the years of education of the child.
load("../Detailed Course Notes/data/intergenerational_mobility.RData")
reg2 <- lm(log(child_fincome) ~ log(parent_fincome) + sex + yearborn + education,
data=intergenerational_mobility)
summary(reg2)
##
## Call:
## lm(formula = log(child_fincome) ~ log(parent_fincome) + sex +
## yearborn + education, data = intergenerational_mobility)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.11404 -0.32489 0.04514 0.36940 2.70867
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.3037430 1.9719502 10.803 < 2e-16 ***
## log(parent_fincome) 0.5964735 0.0198679 30.022 < 2e-16 ***
## sex 0.0318506 0.0194484 1.638 0.101572
## yearborn -0.0085957 0.0009896 -8.686 < 2e-16 ***
## education 0.0012618 0.0003437 3.672 0.000244 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5834 on 3625 degrees of freedom
## Multiple R-squared: 0.2221, Adjusted R-squared: 0.2212
## F-statistic: 258.8 on 4 and 3625 DF, p-value: < 2.2e-16
How do you interpret the coefficient on log(parent_fincome)
in this model?
Let \(Y\) denote a person’s age in the United States. Suppose that you have the theory that \(\mathbb{E}[Y] = 35\). You are able to collect a random sample of 100 observations. Using this data, you calculate \(\bar{Y} = 37\) and that \(\hat{\mathrm{var}}(Y) = 6\).
Calculate a t-statistic for testing the null hypothesis that \(\mathbb{E}[Y]=35\). Do you reject the null hypothesis here? Explain.
What is the standard error of \(\bar{Y}\).
Calculate a p-value for the null hypothesis that \(\mathbb{E}[Y]=35\). How do you interpret it?
Calculate a 95% confidence interval for \(\mathbb{E}[Y]\). How do you interpret it?
Consider the following regression using country-level data, where \(GDP\) is a country’s GDP, \(Inflation\) is the country’s current inflation rate, \(Europe\) is a binary variable indicating whether the country is located in Europe, and where \(Democracy\) is a binary variable indicating whether a country has democratic institutions.
\[GDP = \beta_0 + \beta_1 Inflation + \beta_2 Inflation \cdot Europe + \beta_3 Inflation^2 + \beta_4 Democracy + U\]
What is the partial effect of Inflation in this model?
What is the average partial effect of Inflation in this model?
Given relevant data, how would you estimate the average partial effect of Inflation?