# load data
library(Ecdat)
data("Airq", package="Ecdat")
# a) estimate mean rainfall
ybar <- mean(Airq$rain)
ybar
## [1] 36.078
# b) standard error
V <- var(Airq$rain)
n <- nrow(Airq)
se <- sqrt(V)/sqrt(n)
se
## [1] 2.462628
# c) t-statistic
h0 <- 25
t <- (ybar-h0)/se
t
## [1] 4.498446
Since \(|t| > 1.96\), we would reject \(H_0\) at the 5% significance level.
# d) p-value
pval <- 2*pnorm(-abs(t))
pval
## [1] 6.845183e-06
There is virtually a 0 percent chance of getting a t-statistic this large in absolute value if the null hypotheses were true.
# e) confidence interval
ciL <- ybar - 1.96*se
ciU <- ybar + 1.96*se
paste0("[",round(ciL,3),", ", round(ciU,3), "]")
## [1] "[31.251, 40.905]"
# f) summary statistics
library(modelsummary)
datasummary_balance(~coas, Airq)
Mean | Std. Dev. | Mean | Std. Dev. | Diff. in Means | Std. Error | |
---|---|---|---|---|---|---|
airq | 125.3 | 10.5 | 95.9 | 28.7 | -29.5 | 7.2 |
vala | 4118.2 | 5909.8 | 4218.6 | 4136.7 | 100.4 | 2166.9 |
rain | 32.3 | 7.6 | 37.7 | 15.2 | 5.4 | 4.2 |
dens | 1706.4 | 3014.6 | 1738.1 | 2821.2 | 31.7 | 1178.5 |
medi | 6290.3 | 10065.4 | 10842.2 | 13396.8 | 4551.9 | 4450.1 |
# a)
data(Caschool)
reg <- lm(testscr ~ str + avginc + elpct, data=Caschool)
summary(reg)
##
## Call:
## lm(formula = testscr ~ str + avginc + elpct, data = Caschool)
##
## Residuals:
## Min 1Q Median 3Q Max
## -42.800 -6.862 0.275 6.586 31.199
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 640.31550 5.77489 110.879 <2e-16 ***
## str -0.06878 0.27691 -0.248 0.804
## avginc 1.49452 0.07483 19.971 <2e-16 ***
## elpct -0.48827 0.02928 -16.674 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.35 on 416 degrees of freedom
## Multiple R-squared: 0.7072, Adjusted R-squared: 0.7051
## F-statistic: 334.9 on 3 and 416 DF, p-value: < 2.2e-16
avginc
and elpct
are statistically
different from 0 while str
is not statistically different
0. We can tell by comparing (the absolute value of) the t-statistics in
the column labeled “t value” to to 1.96. The ones that are larger in
magnitude are statistically different from 0.
# b)
mean(Caschool$testscr)
## [1] 654.1565
The average test score in the data is a little over 654.
# c)
predict(reg, newdata=data.frame(str=20, avginc=30, elpct=10))
## 1
## 678.8928
The predicted value here is somewhat higher than the overall sample average from part (b).
# d)
predict(reg, newdata=data.frame(str=15, avginc=30, elpct=10))
## 1
## 679.2367
The predicted value here is almost the same (slightly bigger) than in
part (c). The reason for this is that the estimated coefficient on
str
from the original regression is very small — this means
that changing the student teacher ratio by 5 does not change the
predicted value very much.
Part (a)
\[\begin{align*} t &= \frac{\sqrt{n}(\bar{Y} - \mu_0)}{\sqrt{\widehat{\mathrm{var}}(Y)}} \\ &= \frac{\sqrt{100}(63 - 50)}{\sqrt{225}} \\ &= \frac{(10)(13)}{15} \\ &= 8.67 \end{align*}\]
We reject \(H_0\) here since \(|t| > 1.96\).
Part (b)
\[\begin{align*} \textrm{s.e.}(\bar{Y}) &= \frac{\sqrt{\widehat{\mathrm{var}}(Y)}}{\sqrt{n}} \\ &= \frac{\sqrt{225}}{\sqrt{100}} \\ &= \frac{15}{10} = 1.5 \end{align*}\]
Part (c)
\[\begin{align*} \textrm{p-value} &= 2 \Phi(-|t|) = 2\Phi(-8.67) \approx 0 \end{align*}\]
The p-value is essentially 0 here. It says that, if \(H_0\) were true, then the probability that we would have calculated a t-statistic as extreme as 8.67 is essentially 0. In other words, we have very strong evidence against our theory that \(\mathbb{E}[Y] = 50\).
Part (d)
\[\begin{align*} CI_{95\%} &= [\bar{Y} - 1.96 \textrm{s.e.}(\bar{Y})\, , \bar{Y} + 1.96 \textrm{s.e.}(\bar{Y})] \\ &= [63 - (1.96)(1.5)\, , 63 + (1.96)(1.5)] \\ &= [60.06\, , 65.95] \end{align*}\]
There is a 95% chance that the interval \([60.06\, 65.95]\) contains \(\mathbb{E}[Y]\).
Part (e)
For part (a), changing the significance level to 1% does not change the t-statistic, but it does change the critical value. Instead of using the critical value 1.96, we should use the critical value of 2.58 here. In either case, we would continue to reject \(H_0\).
For parts (b) and (c), neither the standard error nor the p-value changes when we change the significance level.
For part (d), we calculate a 99% confidence interval by using 2.58 as the critical value. Thus, \[\begin{align*} CI_{99\%} &= [\bar{Y} - 2.58 \textrm{s.e.}(\bar{Y})\, , \bar{Y} + 2.58 \textrm{s.e.}(\bar{Y})] \\ &= [63 - (2.58)(1.5)\, , 63 + (2.58)(1.5)] \\ &= [59.13\, , 66.87] \end{align*}\] There is a 99% chance that the interval \([59.13\, , 66.87]\) contains \(\mathbb{E}[Y]\). Notice that the 99% confidence interval is wider than the 95% confidence interval that we reported earlier.