# Topic 5 Linear Regression

In this chapter, our interest will shift to conditional expectations, such as \(\mathbb{E}[Y|X_1,X_2,X_3]\) (I’ll write \(X_1\), \(X_2\), and \(X_3\) in a lot of examples in this chapter, but you can think of there being an arbitrary number of \(X\)’s).

I’ll refer to \(Y\) as the **outcome**. You might also sometimes heae it called the **dependent variable**.

I’ll refer to the \(X\)’s as either **covariates** or **regressors** or **characteristics**. You might also hear them called **independent variables** sometimes.

Before we start to get into the details, let us first discuss why we’re interested in conditional expectations. First, if we are interested in *making predictions*, it will often be the case that the “best” prediction that one can make is the conditional expectation. This should make sense to you — if you want to make a reasonable prediction about what the outcome will be for a new observation that has characteristics \(x_1\), \(x_2\), and \(x_3\), a good way to do it would be to predict that their outcome would be the same as the mean outcome in the population among those that have the same characteristics; that is, \(\mathbb{E}[Y|X_1=x_1, X_2=x_2, X_3=x_3]\).

Next, in economics, we are often interested in how much some outcome of interest changes when a particular covariate changes, holding other covariates constants. To give some examples, we might be interested in the average return of actively managed mutual funds relative to passively managed mutual funds conditional on investing in assets in the same class (e.g., large cap stocks or international bonds). As another example, we might be interested in the effect of an increase in the amount of fertilizer on average crop yield but while holding constant the temperature and precipitation.

How much the outcome, \(Y\), changes on average when one of the covariates, \(X_1\), changes by 1 unit and holding other covariates constant is what we’ll call the **partial effect** of \(X_1\) on \(Y\). Suppose \(X_1\) is binary, then it is given by

\[ PE(x_2,x_3) = \mathbb{E}[Y | X_1=1, X_2=x_2, X_3=x_3] - \mathbb{E}[Y | X_1=0,X_2=x_2,X_3=x_3] \] Notice that the partial effect can depend on \(x_2\) and \(x_3\). For example, it could be that the effect of active management relative to passive management could be different across different asset classes.

Slightly more generally, if \(X_1\) is discrete, so that it can take on several different discrete values, then we define the partial effect as

\[ PE(x_1,x_2,x_3) = \mathbb{E}[Y | X_1=x_1+1, X_2=x_2, X_3=x_3] - \mathbb{E}[Y | X_1=x_1,X_2=x_2,X_3=x_3] \] which now can depend on \(x_1\), \(x_2\), and \(x_3\). This is the average effect of going from \(X_1=x_1\) to \(X_1=x_1+1\) holding \(x_2\) and \(x_3\) constant.

Finally, consider the case where we are interested in the partial effect of \(X_1\) which is continuous (for example, the partial effect of fertilizer input on crop yield). In this case the partial effect is given by the *partial derivative* of \(\mathbb{E}[Y|X_1,X_2,X_3]\) with respect to \(X_1\).

\[ PE(x_1,x_2,x_3) = \frac{\partial \, \mathbb{E}[Y|X_1=x_1, X_2=x_2, X_3=x_3]}{\partial \, x_1} \] This partial derivative is analogous to what we have been doing before — we are making a small change of \(X_1\) while holding \(X_2\) and \(X_3\) constant at \(x_2\) and \(x_3\).

Side-Comment: This is probably the part of the class where we will jump around in the book the most this semester.

The pedagogical approach of the textbook is to introduce the notion of causality very early and to emphasize the requirements on linear regression models in order to deliver causality, while increasing the complexity of the models over several chapters.

This is totally reasonable, but I prefer to start by teaching the mechanics of regressions: how to compute them, how to interpret them (even if you are not able to meet the requirements of causality), and how to use them to make predictions. Then, we’ll have a serious discussion about causality over the last few weeks of the semester.

In practice, this means we’ll cover parts Chapters 4-8 in the textbook now, and then we’ll circle back to some of the issues covered in these chapters again towards the end of the semester.