5.5 Lab 4: Predicting Diamond Prices

For this lab, we will try our hand at predicted diamond prices. We will use the data set diamond_train (which contains around 40,000 observations) and then see how well we can predict data from the diamond_test data.

  1. Estimate a model for \(price\) on \(carat\), \(cut\), and \(clarity\). Report \(R^2\), \(\bar{R}^2\), \(AIC\), and \(BIC\) for this model.

  2. Estimate a model for \(price\) on \(carat\), \(cut\), \(clarity\), \(depth\), \(table\), \(x\), \(y\), and \(z\). Report \(R^2\), \(\bar{R}^2\), \(AIC\), and \(BIC\) for this model.

  3. Choose any model that you would like for \(price\) and report \(R^2\), \(\bar{R}^2\), \(AIC\), and \(BIC\) for this model. We’ll see if your model can predict better than either of the first two.

  4. Use 10-fold cross validation to report an estimate of mean squared prediction error for each of the models from 1-3.

  5. Based on your responses to parts 1-4, which model do you think will predict the best?

  6. Use diamond_test to calculate (out-of-sample) mean squared prediction error for each of the three models from 1-3. Which model performs the best out-of-sample? How does this compare to your answer from 5.

  7. Use the Lasso and Ridge regression on diamond_train data. Evaluate the predictions from each of these models by computing (out-of-sample) mean squared prediction error. How well did these models predict relative to each other and relative the models from 1-3.