Due: At the start of class on Thursday, March 3. Please turn in a hard copy.

Textbook Questions 3.23, 3.24 (you do not have to answer the parts about $$R^2$$ and sum of squared errors), 3.25, 4.1, 4.5, 4.6, 4.23

Extra Question Monte Carlo simulations are a very useful way to study properties of different estimators. Basically, the idea is to use simulated data (where we know what the true model is) and check how well a particular estimator performs.

For this problem, we’ll use Monte Carlo simulations to assess the bias and sampling variance of $$\hat{\beta}$$. Because we have never done this before, I am going to provide some code that is useful for this problem (though you will have to fill in some details).

For a particular simulation, do the following for $$i=1,...,n$$ (for now, set $$n=100$$, but we will try different values of $$n$$ later)

• draw $$X$$ from an exponential distribution with rate parameter equal 1 (you can use rexp function for this)
• draw $$e$$ from a mixture of two normals, with mixture probabilities both equal to 1/2, means equal to -2 and 2, and standard deviations both equal to 1 (you can use mixtools::rnormmix for this)
• Set $$Y=\beta_0 + \beta_1 X + e$$ where $$\beta_0=0$$ and $$\beta_1=1$$ and $$X$$ and $$e$$ are the draws from the first two steps.

The above steps give you a data set with observations of $$Y$$ and $$X$$. Using this data, run a regression of $$Y$$ on $$X$$ (make sure to include an intercept) and recover estimates $$\hat{\beta}_0$$ and $$\hat{\beta}_1$$.

1. Do the above steps 1000 times and calculate (i) the average values of $$\hat{\beta}_1$$ across simulations, and (ii) the variance of $$\hat{\beta}_1$$ across simulations. What do these results suggest to you? In particular, are they consistent with the theory that we have derived in class? Explain.

2. Try the same calculations with $$n=2$$, $$n=10$$, $$n=50$$, and $$n=500$$. What do you notice?

# function to run a single simulation
sim <- function() {
# draw X1
X1 <- rexp(n)

# draw the error term
e <- mixtools::rnormmix(n, lambda=c(.5,.5), mu=c(-2,2), sigma=c(1,1))

## TODO: construct Y

## TODO: use X1 and Y to estimate bet0 and bet1

# TODO: return estimated value of bet1

}

# function to run many simulations
# @param n_sims is the number of simulations to run
run_mc <- function(n_sims=1000) {

# run n_sims simulations and store in a vector
mc_res <- sapply(1:n_sims, function(s) {
sim()
})

# print the mean of b1
cat("mean b1  : ", mean(mc_res), "\n")

# print the variance of b1
cat("var b1   : ", var(mc_res), "\n")
}
# run the simulations
# set values of parameters and number of observations
b0 <- 0
b1 <- 1
n <- 100

run_mc()