3.5 Coding Questions
Run the following code to create the data that we will use in the problem
set.seed(1234) # setting the seed means that we will get the same results <- rexp(100) # make 100 draws from an exponential distribution x
Use the
ggplot2
package to plot a histogram ofx
.For this question, we’ll use the data
fertilizer_2000
. A scatter plot is a useful way to visualize 2-dimensional data. Use theggplot2
package to make a scatter plot with crop yield (avyield
) on the y-axis and fertilizer (avfert
) on the x-axis. Label the y-axis “Crop Yield” and the x-axis “Fertilizer”. Do you notice any pattern from the scatter plot?For this question, we’ll use the data
Airq
. The variablerain
contains the amount of rainfall in the county in a year (in inches). For this question, we’ll be interested in testing whether or not the mean rainfall across counties in California is 25 inches.Estimate the mean rainfall across counties.
Calculate the standard error of your estimate of rainfall.
Calculate a t-statistic for \(H_0 : \mathbb{E}[Y] = 25\) where \(Y\) denotes rainfall. Do you reject \(H_0\) at a 5% significance level? Explain.
Calculate a p-value for \(H_0: \mathbb{E}[Y] = 25\). How should you interpret this?
Calculate a 95% confidence interval for average rainfall.
Use the
datasummary_balance
function from themodelsummary
package to report average air quality, value added, rain, population density, and average income, separately by whether or not the county is located in a coastal area.