5.5 Lab 4: Predicting Diamond Prices
For this lab, we will try our hand at predicted diamond prices. We will use the data set diamond_train
(which contains around 40,000 observations) and then see how well we can predict data from the diamond_test
data.
Estimate a model for \(price\) on \(carat\), \(cut\), and \(clarity\). Report \(R^2\), \(\bar{R}^2\), \(AIC\), and \(BIC\) for this model.
Estimate a model for \(price\) on \(carat\), \(cut\), \(clarity\), \(depth\), \(table\), \(x\), \(y\), and \(z\). Report \(R^2\), \(\bar{R}^2\), \(AIC\), and \(BIC\) for this model.
Choose any model that you would like for \(price\) and report \(R^2\), \(\bar{R}^2\), \(AIC\), and \(BIC\) for this model. We’ll see if your model can predict better than either of the first two.
Use 10-fold cross validation to report an estimate of mean squared prediction error for each of the models from 1-3.
Based on your responses to parts 1-4, which model do you think will predict the best?
Use
diamond_test
to calculate (out-of-sample) mean squared prediction error for each of the three models from 1-3. Which model performs the best out-of-sample? How does this compare to your answer from 5.Use the Lasso and Ridge regression on
diamond_train
data. Evaluate the predictions from each of these models by computing (out-of-sample) mean squared prediction error. How well did these models predict relative to each other and relative the models from 1-3.