3.5 Coding Questions

  1. Run the following code to create the data that we will use in the problem

    set.seed(1234) # setting the seed means that we will get the same results
    x <- rexp(100) # make 100 draws from an exponential distribution

    Use the ggplot2 package to plot a histogram of x.

  2. For this question, we’ll use the data fertilizer_2000. A scatter plot is a useful way to visualize 2-dimensional data. Use the ggplot2 package to make a scatter plot with crop yield (avyield) on the y-axis and fertilizer (avfert) on the x-axis. Label the y-axis “Crop Yield” and the x-axis “Fertilizer”. Do you notice any pattern from the scatter plot?

  3. For this question, we’ll use the data Airq. The variable rain contains the amount of rainfall in the county in a year (in inches). For this question, we’ll be interested in testing whether or not the mean rainfall across counties in California is 25 inches.

    1. Estimate the mean rainfall across counties.

    2. Calculate the standard error of your estimate of rainfall.

    3. Calculate a t-statistic for \(H_0 : \mathbb{E}[Y] = 25\) where \(Y\) denotes rainfall. Do you reject \(H_0\) at a 5% significance level? Explain.

    4. Calculate a p-value for \(H_0: \mathbb{E}[Y] = 25\). How should you interpret this?

    5. Calculate a 95% confidence interval for average rainfall.

    6. Use the datasummary_balance function from the modelsummary package to report average air quality, value added, rain, population density, and average income, separately by whether or not the county is located in a coastal area.