Due: At the start of class on Thursday, March 3. Please turn in a hard copy.

Textbook Questions 3.23, 3.24 (you do not have to answer the parts about \(R^2\) and sum of squared errors), 3.25, 4.1, 4.5, 4.6, 4.23

Extra Question Monte Carlo simulations are a very useful way to study properties of different estimators. Basically, the idea is to use simulated data (where we know what the true model is) and check how well a particular estimator performs.

For this problem, we’ll use Monte Carlo simulations to assess the bias and sampling variance of \(\hat{\beta}\). Because we have never done this before, I am going to provide some code that is useful for this problem (though you will have to fill in some details).

For a particular simulation, do the following for \(i=1,...,n\) (for now, set \(n=100\), but we will try different values of \(n\) later)

The above steps give you a data set with observations of \(Y\) and \(X\). Using this data, run a regression of \(Y\) on \(X\) (make sure to include an intercept) and recover estimates \(\hat{\beta}_0\) and \(\hat{\beta}_1\).

  1. Do the above steps 1000 times and calculate (i) the average values of \(\hat{\beta}_1\) across simulations, and (ii) the variance of \(\hat{\beta}_1\) across simulations. What do these results suggest to you? In particular, are they consistent with the theory that we have derived in class? Explain.

  2. Try the same calculations with \(n=2\), \(n=10\), \(n=50\), and \(n=500\). What do you notice?

# function to run a single simulation
sim <- function() {
  # draw X1
  X1 <- rexp(n)
  
  # draw the error term
  e <- mixtools::rnormmix(n, lambda=c(.5,.5), mu=c(-2,2), sigma=c(1,1))
  
  ## TODO: construct Y
  

  ## TODO: use X1 and Y to estimate bet0 and bet1
  

  # TODO: return estimated value of bet1
  
}

# function to run many simulations
# @param n_sims is the number of simulations to run
run_mc <- function(n_sims=1000) {
  
  # run n_sims simulations and store in a vector
  mc_res <- sapply(1:n_sims, function(s) {
    sim()
  })
  
  # print the mean of b1
  cat("mean b1  : ", mean(mc_res), "\n")
  
  # print the variance of b1
  cat("var b1   : ", var(mc_res), "\n")
}
# run the simulations
# set values of parameters and number of observations
b0 <- 0
b1 <- 1
n <- 100

run_mc()