5.6 Nonlinear Regression Functions

SW 8.1, 8.2

Also, please read all of SW Ch. 8

So far, the the partial effects that we have been interested in have corresponded to a particular parameter in the regression, usually $\beta_1$. I think this can sometimes be a source of confusion as, at least in my view, we are not typically interested in the parameters for their own sake, but rather are interested in partial effects. It just so happens that in some leading cases, they coincide.

In addition, while the $\beta$’s in the sort of models we have considered so far are easy to interpret, in some cases, it might be restrictive to think that the partial effects are the same across different values of the covariates.

In this section, we’ll see the first of several cases where partial effects do not coincide with a particular parameter.

Suppose that

\[ \mathbb{E}[Y|X_1,X_2,X_3] = \beta_0 + \beta_1 X_1 + \beta_2 X_1^2 + \beta_3 X_2 + \beta_4 X_3 \]

Let’s start with making predictions using this model. If you know the values of $\beta_0,\beta_1,\beta_2,\beta_3,$ and $\beta_4$, then to get a prediction, you would still just plug in the values of the regressors that you’d like to get a prediction for (including $x_1^2$).

Next, in this model, the partial effect of $X_1$ is given by

\[ \frac{\partial \, \mathbb{E}[Y|X_1,X_2,X_3]}{\partial \, X_1} = \beta_1 + 2\beta_2 X_1 \] In other words, the partial effect of $X_1$ depends on the value that $X_1$ takes.

In this case, it is sometimes useful to report the partial effect for some different values of $X_1$. In other cases, it is useful to report the average partial effect (APE) which is the mean of the partial effects across the distribution of the covariates. In this case, the APE is given by

\[ APE = \beta_1 + 2 \beta_2 \mathbb{E}[X_1] \] and, once you have estimated the regression, you can compute an estimate of $APE$ by

\[ \widehat{APE} = \hat{\beta}_1 + 2 \hat{\beta}_2 \bar{X}_1 \]

Example 5.4 Let’s continue our example on intergenerational income mobility where $Y$ denotes child’s income, $X_1$ denotes parents’ income, and $X_2$ denotes mother’s education. Now, suppose that

\[ \mathbb{E}[Y|X_1,X_2] = 15,000 + 0.7 X_1 - 0.000002 X_1^2 + 800 X_2 \] Then, predicted child’s income when parents’ income is equal to $50,000 is given by

\[ 15,000 + 0.7 (50,000) - 0.000002 (50,000)^2 + 800 (12) = 54,600 \] In addition, the partial effect of parents’ income is given by

\[ 0.7 - 0.000004 X_1 \] Let’s compute a few different partial effects for different values of parents’ income

$X_1$	PE
20,000	0.62
50,000	0.50
100,000	0.30

which indicates that the partial effect of parents’ income is decreasing — i.e., the effect of additional parents’ income is largest for children whose parents have the lowest income and gets smaller for those whose parents have high incomes.

Finally, if you wanted to compute the $APE$, you would just plug in $\mathbb{E}[X_1]$ (or $\bar{X}_1$) into the expression for the partial effect.

5.6.1 Computation

Including a quadratic (or other higher order term) in R is relatively straightforward. Let’s just do an example.

reg3 <- lm(mpg ~ hp + I(hp^2), data=mtcars)
summary(reg3)
#> 
#> Call:
#> lm(formula = mpg ~ hp + I(hp^2), data = mtcars)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -4.5512 -1.6027 -0.6977  1.5509  8.7213 
#> 
#> Coefficients:
#>               Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  4.041e+01  2.741e+00  14.744 5.23e-15 ***
#> hp          -2.133e-01  3.488e-02  -6.115 1.16e-06 ***
#> I(hp^2)      4.208e-04  9.844e-05   4.275 0.000189 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 3.077 on 29 degrees of freedom
#> Multiple R-squared:  0.7561, Adjusted R-squared:  0.7393 
#> F-statistic: 44.95 on 2 and 29 DF,  p-value: 1.301e-09

The only thing that is new here is I(hp^2). The I function stands for inhibit (you can read the documentation using ?I). For us, this is not too important. You can understand it like this: there is no variable names hp^2 in the data, but if we put the name of a variable that is in the data (here: hp) then we can apply a function to it (here: squaring it) before including it as a regressor.

Interestingly, here it seems there are nonlinear effects of horsepower on miles per gallon. Let’s just quickly report the estimated partial effects for a few different values of horsepower.

hp_vec <- c(100,200,300)
# there might be a native function in r
# to compute these partial effects; I just
# don't know it.
pe <- function(hp) {
  # partial effect is b1 + 2b2*hp
  pes <- coef(reg3)[2] + 2*coef(reg3)[3]*hp
  # print using a data frame
  data.frame(hp=hp, pe=round(pes,3))
}
pe(hp_vec)
#>    hp     pe
#> 1 100 -0.129
#> 2 200 -0.045
#> 3 300  0.039

which suggests that the partial effect of horsepower on miles per gallon is large (though negative) at small values of horsepower and decreasing up to essentially no effect at larger values of horsepower.