7.3 Average Partial Effects

One of the complications with Probit and Logit is that it is not so simple to interpret the estimated parameters.

Remember we are generally interested in partial effects, not the parameters themselves. It just so happens that in many of the linear models that we have considered so far the \(\beta\)’s correspond to the partial effect — this means that it is sometimes easy to forget that they are not what we are typically most interested in.

This is helpful framing for thinking about how to interpret the results from a Probit or Logit model.

Let’s focus on the Probit model. In that case, \[\begin{align*} \mathrm{P}(Y=1|X_1,X_2,X_3) = \Phi(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_3) \end{align*}\] where \(\Phi\) is the cdf of standard normal random variable.

Continuous Case: When \(X_1\) is continuous, the partial effect of \(X_1\) is given by \[\begin{align*} \frac{\partial \mathrm{P}(Y=1|X_1,X_2,X_3)}{\partial X_1} = \phi(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_3) \beta_1 \end{align*}\] where \(\phi\) is the pdf of a standard normal random variable. This is more complicated than the partial effect in the context of a linear model. It depends on \(\phi\) (which looks complicated, but you can just use R’s dnorm command to handle that part). More importantly, the partial effect depends on the values of \(X_1,X_2,\) and \(X_3\). [As discussed above, this is likely a good thing in the context of a binary outcome model]. Thus, in order to get a partial effect, we need to put in some values for these. If you have particular values of the covariates that you are interested in, you can definitely do that, but my general suggestion is to report the Average Partial Effect: \[\begin{align*} APE &= \mathbb{E}\left[ \frac{\partial \mathrm{P}(Y=1|X_1,X_2,X_3)}{\partial X_1} \right] \\ &= \mathbb{E}\left[ \phi(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_3) \beta_1 \right] \end{align*}\] which you can estimate by \[\begin{align*} \widehat{APE} &= \frac{1}{n} \sum_{i=1}^n \phi(\hat{\beta}_0 + \hat{\beta}_1 X_1 + \hat{\beta}_2 X_2 + \hat{\beta}_3 X_3) \hat{\beta}_1 \end{align*}\] which amounts to just computing the partial effect at each value of the covariates in your data and then averaging these partial effects together. This can be a bit cumbersome to do in practice, and it is often convenient to use the R package mfx to compute these sorts of average partial effects for you.

Discrete/Binary Case: When \(X_1\) is discrete (let’s say binary, but extention to discrete is straightforward), the partial effect of \(X_1\) is \[\begin{align*} & \mathrm{P}(Y=1|X_1=1, X_2, X_3) - \mathrm{P}(Y=1|X_1=0, X_2, X_3) \\ &\hspace{100pt} = \Phi(\beta_0 + \beta_1 + \beta_2 X_2 + \beta_3 X_3) - \Phi(\beta_0 + \beta_2 X_2 + \beta_3 X_3) \end{align*}\] Notice that \(\beta_1\) does not show up in the last term. As above, the partial effect depends on the values of \(X_2\) and \(X_3\) which suggests reporting an \(APE\) as above (follows the same steps, just replacing the partial effect, as in the continuous case above)

  • Extensions to Logit are virtually identical, just replace \(\Phi\) with \(\Lambda\) and \(\phi\) with \(\lambda\).

Side-Comment: The parameters from LPM, Probit, and Logit could be quite different (in fact, they are quite different by construction), but APE’s are often very similar.