4.16 Coding

In this section, we’ll use the acs data to calculate an estimate of average wage/salary income among employed individuals in the United States. We’ll test the null hypothesis that the mean income in the United States is $50,000 as well as report the standard error of our estimate of mean income, as well as corresponding p-values, t-statistics, and 95% confidence interval. Finally, we’ll report a table of summary statistics using the modelsummary package separately by college graduates relative to non-college graduates.

load("data/acs.RData")

# estimate of mean income
ybar <- mean(acs$incwage)
ybar
#> [1] 59263.46

# calculate standard error
V <- var(acs$incwage)
n <- nrow(acs)
se <- sqrt(V) / sqrt(n)
se
#> [1] 713.8138

# calculate t-statistic
t_stat <- (ybar - 50000) / se
t_stat
#> [1] 12.97742

This clearly exceeds 1.96 (or any common critical value) which implies that we would reject the null hypothesis that mean income is equal to $50,000.

# calculate p-value
p_val <- 2*pnorm(-abs(t_stat))

The p-value is essentially equal to 0. This is expected given the value of the t-statistic that we calculated earlier.

# 95% confidence interval
ci_L <- ybar - 1.96*se
ci_U <- ybar + 1.96*se
paste0("[",round(ci_L,1),",",round(ci_U,1),"]")
#> [1] "[57864.4,60662.5]"

library(modelsummary)
library(dplyr)
# create a factor variable for going to college
acs$col <- ifelse(acs$educ >= 16, "college", "non-college")
acs$col <- as.factor(acs$col)
acs$female <- 1*(acs$sex==2)
acs$incwage <- acs$incwage/1000
datasummary_balance(~ col, data=dplyr::select(acs, incwage, female, age, col),
                    fmt=2)

	college (N=3871)		non-college (N=6129)
	Mean	Std. Dev.	Mean	Std. Dev.	Diff. in Means	Std. Error
incwage	89.69	96.15	40.05	39.01	-49.65	1.62
female	0.51	0.50	0.46	0.50	-0.04	0.01
age	44.38	13.43	42.80	15.71	-1.58	0.29