## 6.7 Answers to Some Extra Questions

1. The first model appears to be predicting better because the AIC is lower. [Also, notice that $$R^2$$ is higher in the second model, but this is by construction because it includes extra terms relative to the first model which implies that it will fit at least as well in-sample as the first model, but may be suffering from over-fitting.]
2. \begin{align*} \hat{Y} &= 30 + 4 (10) - 2 (1) - 10 (5) \\ &= 18 \end{align*}
1. The tuning parameter is often chosen via cross validation. It makes sense to choose it this way because this is effectively choosing a value of $$\lambda$$ that is making good pseudo-out-of-sample predictions. As we will see below, if you make bad choices of $$\lambda$$, that could result in very poor predictions.
2. When $$\lambda=0$$, there would effectively be no penalty term and, therefore, the estimated parameters would coincide with the OLS estimates.
3. When $$\lambda \rightarrow \infty$$, the penalty term would overwhelm the term corresponding to minimizing SSR. This would result in setting all the estimated parameters to be equal to 0. This extreme approach is likely to lead to very poor predictions.