4.16 Coding

In this section, we’ll use the acs data to calculate an estimate of average wage/salary income among employed individuals in the United States. We’ll test the null hypothesis that the mean income in the United States is $50,000 as well as report the standard error of our estimate of mean income, as well as corresponding p-values, t-statistics, and 95% confidence interval. Finally, we’ll report a table of summary statistics using the modelsummary package separately by college graduates relative to non-college graduates.

load("data/acs.RData")

# estimate of mean income
ybar <- mean(acs$incwage)
ybar
#> [1] 59263.46

# calculate standard error
V <- var(acs$incwage)
n <- nrow(acs)
se <- sqrt(V) / sqrt(n)
se
#> [1] 713.8138

# calculate t-statistic
t_stat <- (ybar - 50000) / se
t_stat
#> [1] 12.97742

This clearly exceeds 1.96 (or any common critical value) which implies that we would reject the null hypothesis that mean income is equal to $50,000.

# calculate p-value
p_val <- 2*pnorm(-abs(t_stat))

The p-value is essentially equal to 0. This is expected given the value of the t-statistic that we calculated earlier.

# 95% confidence interval
ci_L <- ybar - 1.96*se
ci_U <- ybar + 1.96*se
paste0("[",round(ci_L,1),",",round(ci_U,1),"]")
#> [1] "[57864.4,60662.5]"
library(modelsummary)
library(dplyr)
# create a factor variable for going to college
acs$col <- ifelse(acs$educ >= 16, "college", "non-college")
acs$col <- as.factor(acs$col)
acs$female <- 1*(acs$sex==2)
acs$incwage <- acs$incwage/1000
datasummary_balance(~ col, data=dplyr::select(acs, incwage, female, age, col),
                    fmt=2)
college (N=3871)
non-college (N=6129)
Mean Std. Dev. Mean Std. Dev. Diff. in Means Std. Error
incwage 89.69 96.15 40.05 39.01 -49.65 1.62
female 0.51 0.50 0.46 0.50 -0.04 0.01
age 44.38 13.43 42.80 15.71 -1.58 0.29