Due: At the start of class on Thursday, March 3. Please turn in a hard copy.
Textbook Questions 3.23, 3.24 (you do not have to answer the parts about \(R^2\) and sum of squared errors), 3.25, 4.1, 4.5, 4.6, 4.23
Extra Question Monte Carlo simulations are a very useful way to study properties of different estimators. Basically, the idea is to use simulated data (where we know what the true model is) and check how well a particular estimator performs.
For this problem, we’ll use Monte Carlo simulations to assess the bias and sampling variance of \(\hat{\beta}\). Because we have never done this before, I am going to provide some code that is useful for this problem (though you will have to fill in some details).
For a particular simulation, do the following for \(i=1,...,n\) (for now, set \(n=100\), but we will try different values of \(n\) later)
rexp
function for this)mixtools::rnormmix
for this)The above steps give you a data set with observations of \(Y\) and \(X\). Using this data, run a regression of \(Y\) on \(X\) (make sure to include an intercept) and recover estimates \(\hat{\beta}_0\) and \(\hat{\beta}_1\).
Do the above steps 1000 times and calculate (i) the average values of \(\hat{\beta}_1\) across simulations, and (ii) the variance of \(\hat{\beta}_1\) across simulations. What do these results suggest to you? In particular, are they consistent with the theory that we have derived in class? Explain.
Try the same calculations with \(n=2\), \(n=10\), \(n=50\), and \(n=500\). What do you notice?
# function to run a single simulation
sim <- function() {
# draw X1
X1 <- rexp(n)
# draw the error term
e <- mixtools::rnormmix(n, lambda=c(.5,.5), mu=c(-2,2), sigma=c(1,1))
## TODO: construct Y
## TODO: use X1 and Y to estimate bet0 and bet1
# TODO: return estimated value of bet1
}
# function to run many simulations
# @param n_sims is the number of simulations to run
run_mc <- function(n_sims=1000) {
# run n_sims simulations and store in a vector
mc_res <- sapply(1:n_sims, function(s) {
sim()
})
# print the mean of b1
cat("mean b1 : ", mean(mc_res), "\n")
# print the variance of b1
cat("var b1 : ", var(mc_res), "\n")
}
# run the simulations
# set values of parameters and number of observations
b0 <- 0
b1 <- 1
n <- 100
run_mc()