The main material for the final exam comes from Sections 5.4-5.10, Sections 6.1-6.3, and Sections 8.1-6.5 in the Course Notes (one comment: Section 5.11 covers inference in the context of linear regression; you do not need to know all the mathematical details, but you still have to know how to interpret standard errors, p-values, etc. from a regression. The exam is cumulative, but I think that you should expect 50-75% of the material to come from these new sections. The more challenging material on the exam will also come from the new sections.

As I see it, some of the main topics from the first part of the course include:

Being able to write/interpret R code

Being able to work with/manipulate expressions involving expectations

Hypothesis testing, being able to interpret standard errors, p-values, confidence intervals, etc.

Being able to get predicted values from a regression

Besides these, the topics about conditional expectations are a main source of motivation for linear regressions, and the topics of tradeoffs between bias and variance that we talked about earlier in the semester are related to the prediction problems that we have talked about more recently

My general advice for studying is to (i) study the notes that you have take in class, (ii) study the Course Notes, and (iii) for topics where you still have any doubts, follow the cross-references from the Course Notes to the textbook for an additional reference.

R questions are fair game for the exam. As for previous exams, you should expect some coding question(s), but a larger portion (as well as the more challenging questions) of the exam will concern the prediction/causal inference parts of the class.

I anticipate that the exam will take 1.5-2 hours to complete, but you will have the full 3 hours to take the exam.

Finally, I have provided some extra questions here. My recommendation is to study some before you try to answer these questions. The solutions are provided here.