Introduction to R Programming

There are a number of useful resources for R programming. I pointed out quite a few in the course syllabus. The notes for today’s class mainly come from Introduction to Data Science: Data Wrangling and Visualization with R by Rafael Irizarry. I’ll cover some introductory topics that I think are most useful.

Most Important Readings: Chapters 1 (Introduction), 2 (R Basics), 3 (Programming Basics), and 6 (Importing Data)

Secondary Readings: (please read as you have time) Chapters 4 (The tidyverse)

The remaining chapters of this book are all useful, but you can read them over the course of the semester as you have time.

I’m not sure if it is helpful or not, but here are the notes to myself that I used for this material.

List of useful R packages

  • AER — package containing data from Applied Econometrics with R

  • wooldridge — package containing data from Wooldridge’s text book

  • ggplot2 — package to produce sophisticated looking plots

  • dplyr — package containing tools to manipulate data

  • haven — package for loading different types of data files

  • plm — package for working with panel data

  • fixest — another package for working with panel data

  • ivreg — package for IV regressions, diagnostics, etc.

  • estimatr — package that runs regressions but with standard errors that economists often like more than the default options in R

  • modelsummary — package for producing nice output of more than one regression and summary statistics

Practice loading data

Version: [csv] [RData] [dta]

If, for some reason this doesn’t work, you can use the following code to reproduce this data

firm_data <- data.frame(name=c("ABC Manufacturing", "Martin\'s Muffins", "Down Home Appliances", "Classic City Widgets", "Watkinsville Diner"),
                        industry=c("Manufacturing", "Food Services", "Manufacturing", "Manufacturing", "Food Services"),
                        county=c("Clarke", "Oconee", "Clarke", "Clarke", "Oconee"),
                        employees=c(531, 6, 15, 211, 25))

Practice Questions

Note: We’ll try to do these on our own, but if you get stuck, the solutions are here

  1. Create two vectors as follows

    x <- seq(2,10,by=2)
    y <- c(3,5,7,11,13)

    Add x and y, subtract y from x, multiply x and y, and divide x by y and report your results.

  2. The geometric mean of a set of numbers is an alternative measure of central tendency to the more common “arithmetic mean” (this is the mean that we are used to). For a set of \(J\) numbers, \(x_1,x_2,\ldots,x_J\), the geometric mean is defined as

    \[ (x_1 \cdot x_2 \cdot \cdots \cdot x_J)^{1/J} \]

    Write a function called geometric_mean that takes in a vector of numbers and computes their geometric mean. Compute the geometric mean of c(10,8,13)

  3. Use the lubridate package to figure out how many days there were between Jan. 1, 1981 and Jan. 10, 2022.

  4. mtcars is one of the data frames that comes packaged with base R.

    1. How many observations does mtcars have?

    2. How many columns does mtcars have?

    3. What are the names of the columns of mtcars?

    4. Print only the rows of mtcars for cars that get at least 20 mpg

    5. Print only the rows of mtcars that get at least 20 mpg and have at least 100 horsepower (it is in the column called hp)

    6. Print only the rows of mtcars that have 6 or more cylinders (it is in the column labeld cyl) or at least 100 horsepower

    7. Recover the 10th row of mtcars

    8. Sort the rows of mtcars by mpg (from highest to lowest)