**Due:** At the start of class on Thursday, March 3. Please turn in a hard copy.

**Textbook Questions** 3.23, 3.24 (you do not have to answer the parts about \(R^2\) and sum of squared errors), 3.25, 4.1, 4.5, 4.6, 4.23

**Extra Question** Monte Carlo simulations are a very useful way to study properties of different estimators. Basically, the idea is to use simulated data (where we know what the true model is) and check how well a particular estimator performs.

For this problem, we’ll use Monte Carlo simulations to assess the bias and sampling variance of \(\hat{\beta}\). Because we have never done this before, I am going to provide some code that is useful for this problem (though you will have to fill in some details).

For a particular simulation, do the following for \(i=1,...,n\) (for now, set \(n=100\), but we will try different values of \(n\) later)

- draw \(X\) from an exponential distribution with rate parameter equal 1 (you can use
`rexp`

function for this) - draw \(e\) from a mixture of two normals, with mixture probabilities both equal to 1/2, means equal to -2 and 2, and standard deviations both equal to 1 (you can use
`mixtools::rnormmix`

for this) - Set \(Y=\beta_0 + \beta_1 X + e\) where \(\beta_0=0\) and \(\beta_1=1\) and \(X\) and \(e\) are the draws from the first two steps.

The above steps give you a data set with observations of \(Y\) and \(X\). Using this data, run a regression of \(Y\) on \(X\) (make sure to include an intercept) and recover estimates \(\hat{\beta}_0\) and \(\hat{\beta}_1\).

Do the above steps 1000 times and calculate (i) the average values of \(\hat{\beta}_1\) across simulations, and (ii) the variance of \(\hat{\beta}_1\) across simulations. What do these results suggest to you? In particular, are they consistent with the theory that we have derived in class? Explain.

Try the same calculations with \(n=2\), \(n=10\), \(n=50\), and \(n=500\). What do you notice?

```
# function to run a single simulation
sim <- function() {
# draw X1
X1 <- rexp(n)
# draw the error term
e <- mixtools::rnormmix(n, lambda=c(.5,.5), mu=c(-2,2), sigma=c(1,1))
## TODO: construct Y
## TODO: use X1 and Y to estimate bet0 and bet1
# TODO: return estimated value of bet1
}
# function to run many simulations
# @param n_sims is the number of simulations to run
run_mc <- function(n_sims=1000) {
# run n_sims simulations and store in a vector
mc_res <- sapply(1:n_sims, function(s) {
sim()
})
# print the mean of b1
cat("mean b1 : ", mean(mc_res), "\n")
# print the variance of b1
cat("var b1 : ", var(mc_res), "\n")
}
```

```
# run the simulations
# set values of parameters and number of observations
b0 <- 0
b1 <- 1
n <- 100
run_mc()
```