4.13 Lab 3: Solutions
# load packages
library(haven)
library(modelsummary)
library(dplyr)
library(ggplot2)
# load data
<- read_dta("data/birthweight_smoking.dta") Birthweight_Smoking
<- lm(birthweight ~ smoker, data=Birthweight_Smoking)
reg1 summary(reg1)
#>
#> Call:
#> lm(formula = birthweight ~ smoker, data = Birthweight_Smoking)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -3007.06 -313.06 26.94 366.94 2322.94
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 3432.06 11.87 289.115 <2e-16 ***
#> smoker -253.23 26.95 -9.396 <2e-16 ***
#> ---
#> Signif. codes:
#> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 583.7 on 2998 degrees of freedom
#> Multiple R-squared: 0.0286, Adjusted R-squared: 0.02828
#> F-statistic: 88.28 on 1 and 2998 DF, p-value: < 2.2e-16
We estimate that, on average, smoking reduces an infant’s birthweight by about 250 grams. The estimated effect is strongly statistically significant, and (I am not an expert but) that seems like a large effect of smoking to me.
# create smoker factor --- just to make table look nicer
$smoker_factor <- as.factor(ifelse(Birthweight_Smoking$smoker==1, "smoker", "non-smoker"))
Birthweight_Smokingdatasummary_balance(~smoker_factor,
data=dplyr::select(Birthweight_Smoking, -smoker),
fmt=2)
Mean | Std. Dev. | Mean | Std. Dev. | Diff. in Means | Std. Error | |
---|---|---|---|---|---|---|
nprevist | 11.19 | 3.50 | 10.18 | 4.23 | -1.01 | 0.19 |
alcohol | 0.01 | 0.11 | 0.05 | 0.22 | 0.04 | 0.01 |
tripre1 | 0.83 | 0.38 | 0.70 | 0.46 | -0.13 | 0.02 |
tripre2 | 0.14 | 0.34 | 0.22 | 0.41 | 0.08 | 0.02 |
tripre3 | 0.03 | 0.16 | 0.06 | 0.24 | 0.04 | 0.01 |
tripre0 | 0.01 | 0.08 | 0.02 | 0.15 | 0.02 | 0.01 |
birthweight | 3432.06 | 584.62 | 3178.83 | 580.01 | -253.23 | 26.82 |
unmarried | 0.18 | 0.38 | 0.43 | 0.50 | 0.25 | 0.02 |
educ | 13.15 | 2.21 | 11.88 | 1.62 | -1.27 | 0.08 |
age | 27.27 | 5.37 | 25.32 | 5.06 | -1.95 | 0.24 |
drinks | 0.03 | 0.47 | 0.19 | 1.23 | 0.16 | 0.05 |
The things that stand out to me are:
Birthweight tends to be notably lower for smokers relative to non-smokers. The difference is about 7.4% lower birthweight for babies whose mothers smoked.
That said, smoking is also correlated with a number of other things that could be related to lower birthweights. Mothers who smoke went to fewer pre-natal visits on average, were more likely to be unmarried, were more likely to have drink alcohol during their pregnancy, were more likely to be less educated. They also were, on average, somewhat younger than mothers who did not smoke.
<- lm(birthweight ~ smoker + educ + nprevist + age + alcohol,
reg3 data=Birthweight_Smoking)
summary(reg3)
#>
#> Call:
#> lm(formula = birthweight ~ smoker + educ + nprevist + age + alcohol,
#> data = Birthweight_Smoking)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -2728.91 -305.26 24.69 359.63 2220.42
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 2924.963 74.185 39.428 < 2e-16 ***
#> smoker -206.507 27.367 -7.546 5.93e-14 ***
#> educ 5.644 5.532 1.020 0.308
#> nprevist 32.979 2.914 11.318 < 2e-16 ***
#> age 2.360 2.178 1.083 0.279
#> alcohol -39.512 76.365 -0.517 0.605
#> ---
#> Signif. codes:
#> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 570.3 on 2994 degrees of freedom
#> Multiple R-squared: 0.07402, Adjusted R-squared: 0.07247
#> F-statistic: 47.86 on 5 and 2994 DF, p-value: < 2.2e-16
Here we estimate that smoking reduces an infant’s birthweight by about 200 grams on average holding education, number of pre-natal visits, age, and whether or not the mother consumed alcohol constant. The magnitude of the estimated effect is somewhat smaller than the previous estimate. Due to the discussion in #2 (particularly, that smoking was correlated with a number of other characteristics that are likely associated with lower birthweights), this decrease in the magnitude is not surprising.
<- lm(birthweight ~ smoker + I(smoker*age) + educ + nprevist + age + alcohol,
reg4 data=Birthweight_Smoking)
summary(reg4)
#>
#> Call:
#> lm(formula = birthweight ~ smoker + I(smoker * age) + educ +
#> nprevist + age + alcohol, data = Birthweight_Smoking)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -2722.56 -305.12 23.93 363.43 2244.67
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 2853.819 77.104 37.013 < 2e-16 ***
#> smoker 231.578 134.854 1.717 0.086036 .
#> I(smoker * age) -17.145 5.168 -3.317 0.000919 ***
#> educ 4.895 5.528 0.885 0.375968
#> nprevist 32.482 2.913 11.151 < 2e-16 ***
#> age 5.528 2.375 2.328 0.019999 *
#> alcohol -22.556 76.409 -0.295 0.767864
#> ---
#> Signif. codes:
#> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 569.4 on 2993 degrees of freedom
#> Multiple R-squared: 0.07741, Adjusted R-squared: 0.07556
#> F-statistic: 41.85 on 6 and 2993 DF, p-value: < 2.2e-16
We should be careful about the interpretatio here. We have estimated a model like
\[ \mathbb{E}[Birthweight|Smoker, Age, X] = \beta_0 + \beta_1 Smoker + \beta_2 Smoker \cdot Age + \cdots \] Therefore, the partial effect of smoking is given by
\[ \mathbb{E}[Birthweight | Smoker=1, Age, X] - \mathbb{E}[Birthweight | Smoker=0, Age, X] = \beta_1 + \beta_2 Age \] Therefore, the partial effect of smoking depends on \(Age\). For example, for \(Age=18\), the partial effect is \(\beta_1 + \beta_2 (18)\). For \(Age=25\), the partial effect is \(\beta_1 + \beta_2 (25)\), and for \(Age=35\), the partial effect is \(\beta_1 + \beta_2 (35)\). Let’s calculate the partial effect at each of those ages.
<- coef(reg4)[2]
bet1 <- coef(reg4)[3]
bet2
<- bet1 + bet2*18
pe_18 <- bet1 + bet2*25
pe_25 <- bet1 + bet2*35
pe_35
round(cbind.data.frame(pe_18, pe_25, pe_35),2)
#> pe_18 pe_25 pe_35
#> smoker -77.04 -197.05 -368.51
This suggests substantially larger effects of smoking on birthweight for older mothers.
<- lm(birthweight ~ smoker + I(smoker*alcohol) + educ + nprevist + age + alcohol,
reg5 data=Birthweight_Smoking)
summary(reg5)
#>
#> Call:
#> lm(formula = birthweight ~ smoker + I(smoker * alcohol) + educ +
#> nprevist + age + alcohol, data = Birthweight_Smoking)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -2728.99 -304.16 24.54 359.92 2222.10
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 2924.844 74.185 39.426 < 2e-16
#> smoker -201.852 27.765 -7.270 4.57e-13
#> I(smoker * alcohol) -151.860 152.717 -0.994 0.320
#> educ 5.612 5.532 1.014 0.310
#> nprevist 32.844 2.917 11.260 < 2e-16
#> age 2.403 2.178 1.103 0.270
#> alcohol 39.824 110.440 0.361 0.718
#>
#> (Intercept) ***
#> smoker ***
#> I(smoker * alcohol)
#> educ
#> nprevist ***
#> age
#> alcohol
#> ---
#> Signif. codes:
#> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 570.3 on 2993 degrees of freedom
#> Multiple R-squared: 0.07432, Adjusted R-squared: 0.07247
#> F-statistic: 40.05 on 6 and 2993 DF, p-value: < 2.2e-16
The point estimate suggests that the effect of smoking is larger for women who consume alcohol and smoke than for women who do not drink alcohol. This seems plausible, but our evidence is not very strong here — the estimates are not statistically significant at any conventional significance level (the p-value is equal to 0.32).
<- lm(birthweight ~ age + I(age^2), data=Birthweight_Smoking)
reg6 summary(reg6)
#>
#> Call:
#> lm(formula = birthweight ~ age + I(age^2), data = Birthweight_Smoking)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -2949.81 -312.81 30.43 371.03 2452.72
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 2502.8949 225.6016 11.094 < 2e-16 ***
#> age 58.1670 16.9212 3.438 0.000595 ***
#> I(age^2) -0.9099 0.3099 -2.936 0.003353 **
#> ---
#> Signif. codes:
#> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 589.6 on 2997 degrees of freedom
#> Multiple R-squared: 0.009261, Adjusted R-squared: 0.0086
#> F-statistic: 14.01 on 2 and 2997 DF, p-value: 8.813e-07
<- predict(reg6, newdata=data.frame(age=seq(18,40)))
preds ggplot(data.frame(preds=preds, age=seq(18,40)), aes(x=age, y=preds)) +
geom_line() +
geom_point(size=3) +
theme_bw() +
ylab("predicted values")
The figure suggests that predicted birthweight is increasing in mother’s age up until about age 34 and then decreasing after that.
<- lm(I(log(birthweight)) ~ smoker + educ + nprevist + age + alcohol,
reg7 data=Birthweight_Smoking)
summary(reg7)
#>
#> Call:
#> lm(formula = I(log(birthweight)) ~ smoker + educ + nprevist +
#> age + alcohol, data = Birthweight_Smoking)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.96324 -0.07696 0.02435 0.12092 0.50070
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 7.9402678 0.0270480 293.562 < 2e-16 ***
#> smoker -0.0635764 0.0099782 -6.372 2.16e-10 ***
#> educ 0.0022169 0.0020171 1.099 0.272
#> nprevist 0.0129662 0.0010624 12.205 < 2e-16 ***
#> age 0.0003059 0.0007941 0.385 0.700
#> alcohol -0.0181053 0.0278428 -0.650 0.516
#> ---
#> Signif. codes:
#> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.2079 on 2994 degrees of freedom
#> Multiple R-squared: 0.07322, Adjusted R-squared: 0.07167
#> F-statistic: 47.31 on 5 and 2994 DF, p-value: < 2.2e-16
The estimated coefficient on \(smoker\) says that smoking during pregnancy decreases a baby’s birthweight by 6.3%, on average, holding education, number of pre-natal visits, age of the mother, and whether or not the mother consumed alcohol during the pregnancy constant.