## 5.2 Linear Regression Models

SW 4.1

In order to get around the curse of dimensionality that we discussed in the previous section, we will often an impose a linear model for the conditional expectation. For example,

$\mathbb{E}[Y|X] = \beta_0 + \beta_1 X$ or

$\mathbb{E}[Y|X_1,X_2,X_3] = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_3$ If we know the values of $$\beta_0$$, $$\beta_1$$, $$\beta_2$$, and $$\beta_3$$, then it is straighforward for us to make predictions. In particular, suppose that we want to predict the outcome for a new observation with characteristics $$x_1$$, $$x_2$$, and $$x_3$$. Our prediction would be

$\beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3$

Example 5.1 Suppose that you are studying intergenerational income mobility and that you are interested in predicting a child’s income whose parents’ income was \$50,000 and whose mother had 12 years of education. Let $$Y$$ denote child’s income, $$X_1$$ denote parents’ income, and $$X_2$$ denote mother’s education. Further, suppose that $$\mathbb{E}[Y|X_1,X_2] = 20,000 + 0.5 X_1 + 1000 X_2$$.

In this case, you would predict child’s income to be

$20,000 + 0.5 (50,000) + 1000(12) = 57,000$

Side-Comment:

The above model can be equivalently written as \begin{align*} Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_3 + U \end{align*} where $$U$$ is called the error term and satisfies $$\mathbb{E}[U|X_1,X_2,X_3] = 0$$. There will be a few times where this formulation will be useful for us.