Suppose you work for a social media company. The social media company is trying to predict the number of clicks that different types of advertisements will get on their website. You run the following regression to try to the number of clicks that a particular advertisement will get: \[\begin{align*} Clicks = \beta_0 + \beta_1 FontSize + \beta_2 Picture + U \end{align*}\] where \(Clicks\) is the number of clicks that an ad gets (in thousands), \(FontSize\) is the size of the font of the ad, and \(Picture\) is a binary variable that is equal to one if the ad contains a picture and 0 otherwise.

Suppose you estimate this model and estimate that \(\hat{\beta}_0 = 40\), \(\hat{\beta}_1 = 2\), and \(\hat{\beta}_2 = 80\). What would you predict that the number of clicks would be for an ad with 16 point font size and that contains a picture?

Your boss is very happy with your work, but suggests making the model more complicated. Your boss suggests you run the following regression

\[\begin{align*} Revenue = \beta_0 &+ \beta_1 FontSize + \beta_2 Picture + \beta_3 Animated \\ &+ \beta_4 ColorfulFont + \beta_5 FontSize^2 + U \end{align*}\] (here \(Animated\) is a binary variable that is equal to one if the ad contains an animation and is equal to 0 otherwise; and \(ColorfulFont\) is a binary variable that is equal to 1 if the font in the ad is any color besides black and 0 otherwise). You estimate the model and notice that

model from part (a) model from part (b) \(R^2\) 0.11 0.37 Adj. \(R^2\) 0.10 0.35 AIC 6789 4999 BIC 6536 4876 Based on the table, which model do you prefer for predicting ad clicks?

An alternative approach to choosing between these two models is to use J-fold cross-validation. Explain how you could use J-fold cross validation in this problem.

Questions about causal inference.

What does the condition \((Y(1), Y(0)) \perp D\) mean? When would you expect it to hold?

What does the condition \((Y(1), Y(0)) \perp D | (X_1, X_2)\) mean? How is this different from the previous condition?

Suppose you are interested interested in the effect of a state policy that decreases the minimum legal drinking age from 21 to 18 on the number of traffic fatalities in a state. Do you think that the condition in part (a) is likely to hold here? Explain. What variables would you need to include in the condition in part (b) to hold? Explain.

Suppose you are interested in the causal effect of \(D\) on \(Y\). If you could estimate the following model, you would be willing to interpret \(\alpha\) as the causal effect of \(D\) on \(Y\) \[\begin{align*} Y_i = \beta_0 + \alpha D_i + \beta_1 W_i + U_i \end{align*}\] where \(\mathbb{E}[U|D,W]=0\). However, you do not observe \(W_i\).

Since you do not observe \(W_i\), you are considering just running a regression of \(Y_i\) on \(D_i\). Will this strategy work? Explain.

Now suppose that you actually have access to panel data. Further, suppose that \(W\) does not vary over time, but that \(Y\) and \(D\) do vary over time. Therefore, you are considering the model \[\begin{align*} Y_{it} = \beta_0 + \alpha D_{it} + \beta_1 W_i + U_{it} \end{align*}\] Explain how you can use this setup to estimate the causal effect \(\alpha\) (be specific about exactly what regression you would run here).

Now, suppose that actually the effect of \(W\) varies over time, so that the model from part (b) becomes \[\begin{align*} Y_{it} = \beta_0 + \alpha D_{it} + \beta_{1t}W_i + U_{it} \end{align*}\] (note: whatโs different here is that \(\beta_{1t}\) changes across time periods). Will your strategy from part (b) continue to work in this case? Explain.

Extra Questions 6.8