## 5.5 Lab 4: Predicting Diamond Prices

For this lab, we will try our hand at predicted diamond prices. We will use the data set diamond_train (which contains around 40,000 observations) and then see how well we can predict data from the diamond_test data.

1. Estimate a model for $$price$$ on $$carat$$, $$cut$$, and $$clarity$$. Report $$R^2$$, $$\bar{R}^2$$, $$AIC$$, and $$BIC$$ for this model.

2. Estimate a model for $$price$$ on $$carat$$, $$cut$$, $$clarity$$, $$depth$$, $$table$$, $$x$$, $$y$$, and $$z$$. Report $$R^2$$, $$\bar{R}^2$$, $$AIC$$, and $$BIC$$ for this model.

3. Choose any model that you would like for $$price$$ and report $$R^2$$, $$\bar{R}^2$$, $$AIC$$, and $$BIC$$ for this model. We’ll see if your model can predict better than either of the first two.

4. Use 10-fold cross validation to report an estimate of mean squared prediction error for each of the models from 1-3.

5. Based on your responses to parts 1-4, which model do you think will predict the best?

6. Use diamond_test to calculate (out-of-sample) mean squared prediction error for each of the three models from 1-3. Which model performs the best out-of-sample? How does this compare to your answer from 5.

7. Use the Lasso and Ridge regression on diamond_train data. Evaluate the predictions from each of these models by computing (out-of-sample) mean squared prediction error. How well did these models predict relative to each other and relative the models from 1-3.