There are a number of useful resources for R programming. I pointed out quite a few in the course syllabus and in the introduction to econ 8080 slides. The notes for today’s class mainly come from Introduction to Data Science by Rafael Irizarry. I’ll cover some introductory topics that I think are most useful.

Most Important Readings: Chapters 1 (Introduction), 2 (R Basics), 3 (Programming Basics), and 5 (Importing Data)

Secondary Readings: (please read as you have time) Chapters 4 (The tidyverse), 7 (Introduction to data visualization), 8 (ggplot2)

The remaining chapters below are just in case you are particularly interested in some topic (these are likely more than you need to know for our course):

• Data visualization - Chapters 9-12
• Data wrangling - Chapter 21-27
• Github - Chapter 40
• Reproducible Research - Chapter 41

I think you can safely ignore all other chapters.

I’m not sure if it is helpful or not, but here are the notes to myself that I used to teach our two review sessions on R.

## List of useful R packages

• AER — package containing data from Applied Econometrics with R

• wooldridge — package containing data from Wooldridge’s text book

• ggplot2 — package to produce sophisticated looking plots

• dplyr — package containing tools to manipulate data

• haven — package for loading different types of data files

• plm — package for working with panel data

• fixest — another package for working with panel data

• ivreg — package for IV regressions, diagnostics, etc.

• estimatr — package that runs regressions but with standard errors that economists often like more than the default options in R

• modelsummary — package for producing nice output of more than one regression and summary statistics

Version: [csv] [RData] [dta]

If, for some reason this doesn’t work, you can use the following code to reproduce this data

firm_data <- data.frame(name=c("ABC Manufacturing", "Martin\'s Muffins", "Down Home Appliances", "Classic City Widgets", "Watkinsville Diner"),
industry=c("Manufacturing", "Food Services", "Manufacturing", "Manufacturing", "Food Services"),
county=c("Clarke", "Oconee", "Clarke", "Clarke", "Oconee"),
employees=c(531, 6, 15, 211, 25))

# Practice Questions

Note: We’ll try to do these on our own, but if you get stuck, the solutons are here

1. Create two vectors as follows

x <- seq(2,10,by=2)
y <- c(3,5,7,11,13)

Add x and y, subtract y from x, multiply x and y, and divide x by y and report your results.

2. The geometric mean of a set of numbers is an alternative measure of central tendency to the more common “arithmetic mean” (this is the mean that we are used to). For a set of $$J$$ numbers, $$x_1,x_2,\ldots,x_J$$, the geometric mean is defined as

$(x_1 \cdot x_2 \cdot \cdots \cdot x_J)^{1/J}$

Write a function called geometric_mean that takes in a vector of numbers and computes their geometric mean. Compute the geometric mean of c(10,8,13)

3. Use the lubridate package to figure out how many days there were between Jan. 1, 1981 and Jan. 10, 2022.

4. mtcars is one of the data frames that comes packaged with base R.

1. How many observations does mtcars have?

2. How many columns does mtcars have?

3. What are the names of the columns of mtcars?

4. Print only the rows of mtcars for cars that get at least 20 mpg

5. Print only the rows of mtcars that get at least 20 mpg and have at least 100 horsepower (it is in the column called hp)

6. Print only the rows of mtcars that have 6 or more cylinders (it is in the column labeld cyl) or at least 100 horsepower

7. Recover the 10th row of mtcars

8. Sort the rows of mtcars by mpg (from highest to lowest)