1. Consider the following regression where airq is an indicator of air quality (lower is better) for a particular metropolitan area in California, dens1000 is the number of 1000s of people per square mile, coas indicates whether or not the metro area is on the coast, and medi1000 is the median income in the metro area (in thousands of dollars).

data("Airq", package="Ecdat")
library(modelsummary)
Airq$coas <- 1*(Airq$coas=="yes")
Airq$dens1000 <- Airq$dens/1000
Airq$medi1000 <- Airq$medi/1000
reg1 <- lm(airq ~ dens1000 + coas + dens1000*coas + medi1000, data=Airq)
modelsummary(reg1, fmt=1, gof_omit=".")

Model 1

(Intercept)

120.6

(9.5)

dens1000

−0.3

(2.8)

coas

−31.2

(11.3)

medi1000

0.8

(0.4)

dens1000 × coas

−1.2

(3.4)

1. Which regressors are statistically significant in this regression?

2. What is the predicted value for the air quality index for a metro area with 1000 people per square mile, that is not located on the coast, and with median income equal to \$50,000?

1. Consider the following regression, where child_fincome is child’s family income, parent_fincome is parents’ family income, sex is binary variable indicating whether a child is male, yearborn is the year that the child was born in, and education is the years of education of the child.

load("../Detailed Course Notes/data/intergenerational_mobility.RData")

reg2 <- lm(log(child_fincome) ~ log(parent_fincome) + sex + yearborn + education,
data=intergenerational_mobility)
summary(reg2)
##
## Call:
## lm(formula = log(child_fincome) ~ log(parent_fincome) + sex +
##     yearborn + education, data = intergenerational_mobility)
##
## Residuals:
##      Min       1Q   Median       3Q      Max
## -3.11404 -0.32489  0.04514  0.36940  2.70867
##
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)
## (Intercept)         21.3037430  1.9719502  10.803  < 2e-16 ***
## log(parent_fincome)  0.5964735  0.0198679  30.022  < 2e-16 ***
## sex                  0.0318506  0.0194484   1.638 0.101572
## yearborn            -0.0085957  0.0009896  -8.686  < 2e-16 ***
## education            0.0012618  0.0003437   3.672 0.000244 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5834 on 3625 degrees of freedom
## Multiple R-squared:  0.2221, Adjusted R-squared:  0.2212
## F-statistic: 258.8 on 4 and 3625 DF,  p-value: < 2.2e-16

How do you interpret the coefficient on log(parent_fincome) in this model?

1. Let $$Y$$ denote a person’s age in the United States. Suppose that you have the theory that $$\mathbb{E}[Y] = 35$$. You are able to collect a random sample of 100 observations. Using this data, you calculate $$\bar{Y} = 37$$ and that $$\hat{\mathrm{var}}(Y) = 6$$.

1. Calculate a t-statistic for testing the null hypothesis that $$\mathbb{E}[Y]=35$$. Do you reject the null hypothesis here? Explain.

2. What is the standard error of $$\bar{Y}$$.

3. Calculate a p-value for the null hypothesis that $$\mathbb{E}[Y]=35$$. How do you interpret it?

4. Calculate a 95% confidence interval for $$\mathbb{E}[Y]$$. How do you interpret it?

1. Consider the following regression using country-level data, where $$GDP$$ is a country’s GDP, $$Inflation$$ is the country’s current inflation rate, $$Europe$$ is a binary variable indicating whether the country is located in Europe, and where $$Democracy$$ is a binary variable indicating whether a country has democratic institutions.

$GDP = \beta_0 + \beta_1 Inflation + \beta_2 Inflation \cdot Europe + \beta_3 Inflation^2 + \beta_4 Democracy + U$

1. What is the partial effect of Inflation in this model?

2. What is the average partial effect of Inflation in this model?

3. Given relevant data, how would you estimate the average partial effect of Inflation?