## 5.11 Inference

SW 4.5, 5.1, 5.2, 6.6

We discussed in class the practical issues of inference in linear regression models.

These results rely on arguments building on the Central Limit Theorem (this should not surprise you as it is similar to the case for the asymptotic distribution of $$\sqrt{n}(\bar{Y} - \mathbb{E}[Y]))$$ that we discussed earlier in the semester.

In this section, I sketch these types of arguments for you. This material is advanced, but I suggest that you study this material.

We are going to show that, in the simple linear regression model, \begin{align*} \sqrt{n}(\hat{\beta}_1 - \beta_1) \rightarrow N(0,V) \quad \textrm{as} \ n \rightarrow \infty \end{align*} where \begin{align*} V = \frac{\mathbb{E}[(X-\mathbb{E}[X])^2 U^2]}{\mathrm{var}(X)^2} \end{align*} and discuss how to use this result to conduct inference.

Let’s start by showing why this result holds.

To start with, recall that \begin{align} \hat{\beta}_1 = \frac{\widehat{\mathrm{cov}}(X,Y)}{\widehat{\mathrm{var}}(X)} \tag{5.4} \end{align}

Helpful Intermediate Result 1 Notice that \begin{align*} \frac{1}{n}\sum_{i=1}^n \Big( (X_i - \bar{X})\bar{Y}\Big) &= \bar{Y} \frac{1}{n}\sum_{i=1}^n \Big( X_i-\bar{X} \Big) \\ &= \bar{Y} \left( \frac{1}{n}\sum_{i=1}^n X_i - \frac{1}{n}\sum_{i=1}^n \bar{X} \right) \\ &= \bar{Y} \Big(\bar{X} - \bar{X} \Big) \\ &= 0 \end{align*} where the first equality just pulls $$\bar{Y}$$ out of the summation (it is a constant with respect to the summation), the second equality pushes the summation through the difference, the first part of the third equality holds by the definition of $$\bar{X}$$ and the second part holds because it is an average of a constant.

This implies that \begin{align} \frac{1}{n}\sum_{i=1}^n \Big( (X_i - \bar{X})(Y_i - \bar{Y})\Big) = \frac{1}{n}\sum_{i=1}^n \Big( (X_i - \bar{X})Y_i\Big) \tag{5.5} \end{align} and very similar arguments (basically the same arguments in reverse) also imply that \begin{align} \frac{1}{n}\sum_{i=1}^n \Big( (X_i - \bar{X})X_i\Big) = \frac{1}{n}\sum_{i=1}^n \Big( (X_i - \bar{X})(X_i - \bar{X})\Big) \tag{5.6} \end{align} We use both (5.5) and (5.6) below.

Next, consider the numerator in (5.4) \begin{align*} \widehat{\mathrm{cov}}(X,Y) &= \frac{1}{n} \sum_{i=1}^n (X_i - \bar{X})(Y_i - \bar{Y}) \\ &= \frac{1}{n} \sum_{i=1}^n (X_i - \bar{X})Y_i \\ &= \frac{1}{n} \sum_{i=1}^n (X_i - \bar{X})(\beta_0 + \beta_1 X_i + U_i) \\ &= \underbrace{\beta_0 \frac{1}{n} \sum_{i=1}^n (X_i - \bar{X})}_{(A)} + \underbrace{\beta_1 \frac{1}{n} \sum_{i=1}^n (X_i - \bar{X}) X_i}_{(B)} + \underbrace{\frac{1}{n} \sum_{i=1}^n (X_i - \bar{X}) U_i}_{(C)}) \\ \end{align*} where the first equality holds by the definition of sample covariance, the second equality holds by (5.5), the third equality plugs in for $$Y_i$$, and the last equality combines terms and passes the summation through the additions/subtractions.

Now, let’s consider each of these in turn.

For (A), \begin{align*} \frac{1}{n} \sum_{i=1}^n X_i = \bar{X} \qquad \textrm{and} \qquad \frac{1}{n} \sum_{i=1}^n \bar{X} = \bar{X} \end{align*} which implies that this term is equal to 0.

For (B), notice that \begin{align*} \beta_1 \frac{1}{n} \sum_{i=1}^n (X_i - \bar{X}) X_i &= \beta_1 \frac{1}{n} \sum_{i=1}^n (X_i - \bar{X}) (X_i - \bar{X}) \\ &= \beta_1 \widehat{\mathrm{var}}(X) \end{align*} where the first equality holds by (5.6) and the second equality holds by the definition of sample variance.

For (C), well, we’ll just carry that one around for now.

Plugging in the expressions for (A), (B), and (C) back into Equation (5.4) implies that \begin{align*} \hat{\beta}_1 = \beta_1 + \frac{1}{n} \sum_{i=1}^n \frac{(X_i - \bar{X}) U_i}{\widehat{\mathrm{var}}(X)} \end{align*} Next, re-arranging terms and multiplying both sides by $$\sqrt{n}$$ implies that \begin{align*} \sqrt{n}(\hat{\beta}_1 - \beta_1) &= \sqrt{n} \left(\frac{1}{n} \sum_{i=1}^n \frac{(X_i - \bar{X}) U_i}{\widehat{\mathrm{var}}(X)}\right) \\ & \approx \sqrt{n} \left(\frac{1}{n} \sum_{i=1}^n \frac{(X_i - \mathbb{E}[X]) U_i}{\mathrm{var}(X)}\right) \end{align*} The last line (the approximately one) is kind of a weak argument, but basically you can replace $$\bar{X}$$ and $$\widehat{\mathrm{var}}(X)$$ and the effect of this replacement will converge to 0 in large samples (this is the reason for the approximately) — if you want a more complete explanation, sign up for my graduate econometrics class next semester.

Is this helpful? It may not be obvious, but the right hand side of the above equation is actually something that we can apply the Central Limit Theorem to. In particular, maybe it is helpful to define $$Z_i = \frac{(X_i - \mathbb{E}[X]) U_i}{\mathrm{var}(X)}$$. We know that we could apply a Central Limit Theorem to $$\sqrt{n}\left( \frac{1}{n} \sum_{i=1}^n Z_i \right)$$ if (i) $$Z_i$$ had mean 0, and (ii) it is iid. That it is iid holds immediately from the random sampling assumption. For mean 0, \begin{align*} \mathbb{E}[Z] &= \mathbb{E}\left[ \frac{(X - \mathbb{E}[X]) U}{\mathrm{var}(X)}\right] \\ &= \frac{1}{\mathrm{var}(X)} \mathbb{E}[(X - \mathbb{E}[X]) U] \\ &= \frac{1}{\mathrm{var}(X)} \mathbb{E}[(X - \mathbb{E}[X]) \underbrace{\mathbb{E}[U|X]}_{=0}] \\ &= 0 \end{align*} where the only challenging line here is the third one holds from the Law of Iterated Expectations. This means that we can apply the central limit theorem, and in particular, $$\sqrt{n} \left( \frac{1}{n} \sum_{i=1}^n Z_i \right) \rightarrow N(0,V)$$ where $$V=\mathrm{var}(Z) = \mathbb{E}[Z^2]$$ (where the 2nd equality here holds because $$Z$$ has mean 0). Now, just substituting back in for $$Z$$ implies that \begin{align*} \sqrt{n}(\hat{\beta}_1 - \beta_1) \rightarrow N(0,V) \end{align*} where \begin{align} V &= \mathbb{E}\left[ \left( \frac{(X - \mathbb{E}[X]) U}{\mathrm{var}(X)} \right)^2 \right] \nonumber \\ &= \mathbb{E}\left[ \frac{(X - \mathbb{E}[X])^2 U^2}{\mathrm{var}(X)^2}\right] \tag{5.7} \end{align} which is what we were aiming for.

Given this result, all our previous work on standard errors, t-statistics, p-values, and confidence intervals applies. First, let me mention the way that you would estimate $$V$$ (same as always, just replace the population quantities with corresponding sample quantities).

$\hat{V} = \frac{ \frac{1}{n} \displaystyle \sum_{i=1}^n (X_i - \bar{X})^2 \hat{U}_i^2}{\widehat{\mathrm{var}}(X)^2}$

where $$\hat{U}_i$$ are the residuals.

Now, standard errors are just the same as before (the only difference is that $$\hat{V}$$ itself has changed)

\begin{aligned} \textrm{s.e.}(\hat{\beta}) &= \frac{\sqrt{\hat{V}}}{\sqrt{n}} \end{aligned}

By far the most common null hypothesis is $$H_0: \beta = 0$$, which suggests the following t-statistic:

$t = \frac{\hat{\beta}}{\textrm{s.e.}(\hat{\beta})}$ One can continue to calculate a p-value by

$\textrm{p-value} = 2 \Phi(-|t|)$ and a 95% confidence interval is given by

$CI = [\hat{\beta} - 1.96 \textrm{s.e.}(\hat{\beta}), \hat{\beta} + 1.96 \textrm{s.e.}(\hat{\beta})]$

### 5.11.1 Computation

Let’s check if what we derived is what we can compute using R.

# this is the same regression as in the previous section
summary(reg8)
#>
#> Call:
#> lm(formula = mpg ~ hp, data = mtcars)
#>
#> Residuals:
#>     Min      1Q  Median      3Q     Max
#> -5.7121 -2.1122 -0.8854  1.5819  8.2360
#>
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
#> hp          -0.06823    0.01012  -6.742 1.79e-07 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 3.863 on 30 degrees of freedom
#> Multiple R-squared:  0.6024, Adjusted R-squared:  0.5892
#> F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07

# show previous calcuations
data.frame(bet0=bet0, bet1=bet1)
#>       bet0        bet1
#> 1 30.09886 -0.06822828

# components of Vhat
Y <- mtcars$mpg X <- mtcars$hp
Uhat <- Y - bet0 - bet1*X
Xbar <- mean(X)
varX <- mean( (X-Xbar)^2 )
Vhat <- mean( (X-Xbar)^2 * Uhat^2 ) / ( varX^2 )
n <- nrow(mtcars)
se <- sqrt(Vhat)/sqrt(n)
t_stat <- bet1/se
p_val <- 2*pnorm(-abs(t_stat))
ci_L <- bet1 - 1.96*se
ci_U <- bet1 + 1.96*se

# print results
round(data.frame(se, t_stat, p_val, ci_L, ci_U),5)
#>        se   t_stat p_val     ci_L     ci_U
#> 1 0.01313 -5.19644     0 -0.09396 -0.04249

Interestingly, these are not exactly the same as what comes from the lm command. Here’s what the difference is: R makes a simplifying assumption called “homoskedasticity” that simplifies the expression for the variance. This can result in slightly different standard errors (and therefore slightly different t-statistics, p-values, and confidence intervals too) than the ones we calculated.

An alternative package that is popular among economists for estimating regressions and getting “heteroskedasticity robust” standard errors is the estimatr package.

library(estimatr)

reg9 <- lm_robust(mpg ~ hp, data=mtcars, se_type="HC0")
summary(reg9)
#>
#> Call:
#> lm_robust(formula = mpg ~ hp, data = mtcars, se_type = "HC0")
#>
#> Standard error type:  HC0
#>
#> Coefficients:
#>             Estimate Std. Error t value  Pr(>|t|) CI Lower CI Upper DF
#> (Intercept) 30.09886    2.01067  14.970 1.851e-15 25.99252 34.20520 30
#> hp          -0.06823    0.01313  -5.196 1.338e-05 -0.09504 -0.04141 30
#>
#> Multiple R-squared:  0.6024 ,    Adjusted R-squared:  0.5892
#> F-statistic:    27 on 1 and 30 DF,  p-value: 1.338e-05

The “HC0” standard errors are “heteroskedasticity consistent” standard errors, and you can see that they match what we calculated above.