2 Introduction to R

We will learn a lot more about statistical programming this semester, but we’ll start with a crash course on R with the idea of getting you up-and-running.

I listed a few references in the Introduction, but this section will mostly follow the discussion in Introduction to Data Science: Data Wrangling and Visualization with R, by Rafael Irizarry. I’ll abbreviate this reference as IDS throughout this section.

IDS is not specifically geared towards Econometrics, but I think it is a really fantastic book and resource. In this section, I cover what I think are the most important basics of R programming and additionally point you to the references for the material that I cover in class. But I would strongly recommend reading all of the first 5 chapters of IDS over the next couple of weeks. We will basically only cover the first 5 chapters in our class, but the course should set you up so that the remaining 35 chapters of the book can serve as helpful reference material throughout the rest of the semester.

2.1 Setting up R

This section covers how to set up R and RStudio and then what RStudio will look like when you open it up.

2.1.1 What is R?

2.1.2 Downloading R

We will use R (https://www.r-project.org/) to analyze data. R is freely available and available across platforms. You should go ahead and download R for your personal computer as soon as possible — this should be relatively straightforward. It is also available at most computer labs on campus.

2.1.3 RStudio

Base R comes with a lightweight development environment (i.e., a place to write and execute code), but most folks prefer RStudio as it has more features. You can download it here: https://www.rstudio.com/products/rstudio/download/#download; choose the free version based on your operating system (Linux, Windows, Mac, etc.).

2.1.4 RStudio Development Environment

2.2 Installing R Packages

2.2.1 A list of useful R packages

AER — package containing data from Applied Econometrics with R
wooldridge — package containing data from Wooldridge’s text book
ggplot2 — package to produce sophisticated looking plots
dplyr — package containing tools to manipulate data
haven — package for loading different types of data files
plm — package for working with panel data
fixest — another package for working with panel data
ivreg — package for IV regressions, diagnostics, etc.
estimatr — package that runs regressions but with standard errors that economists often like more than the default options in R
modelsummary — package for producing nice output of more than one regression and summary statistics

As of this writing, there are currently 18,004 R packages available on CRAN (R’s main repository for contributed packages).

2.3 R Basics

2.3.1 Objects

2.3.2 Workspace

2.3.3 Importing Data

To work with actual data in R, we will need to import it. I mentioned the “Import Data” button above, but let me mention a few other possibilities here, including how to import data by writing code.

On the course website, I posted three files firm.data.csv, firm_data.RData, and firm_data.dta. All three of these contain exactly the same small, fictitious dataset, but are saved in different formats.

Probably the easiest way to import data in R is through the Files pane on the bottom right. But, in order to do this, you may need to change your working directory. We will do this using RStudio’s user interface in the following steps:

First navigate to Sessions -> Set Working Directory -> Choose Directory. This will open a window that will allow you to choose the directory where you saved the data.

Next, use the menu to navigate to the place where you saved firm_data.csv. I created a folder ~/Dropbox/Courses/Georgia/Undergrad Econometrics/24 Fall/firm data/ and saved it there.

Now, we have set the working directory, and this is what RStudio looks like for me. Notice that the working directory is now set to the folder where I saved the data. You can see the difference in the Files pane.

Next, we will load the data, just by clicking it in the Files pane. I picked firm_data.csv, but any of the three files will work. R is quite good at recognizing different types of data files and importing them, so this same procedure will work for firm_data.RData and firm_data.dta even though they are different types of files. Once you click it, you will get a screen that should look like this

Click “Import” and the data should be imported. You can see that it is now in the Environment pane.

Next, let’s discuss how to import data by writing computer code (by the way, this is actually what is happening behind the scenes when you import data through the user interface as described above). “csv” stands for “Comma Separated Values”. This is basically a plain text file (e.g., try opening it in Notepad or Text Editor) where the columns are separated by commas and the rows are separated by being on different lines. Most any computer program can read this type of file; that is, you could easily import this file into, say, R, Excel, or Stata. You can import a .csv file using R code by

firm_data <- read.csv("firm_data.csv")

An RData file is the native format for saving data in R. You can import an RData file using the following command:

firm_data <- load("firm_data.RData")

Similarly, a dta file the native format for saving data in Stata. You can import a dta file using the following command:

library(haven) # external package for reading dta file
firm_data <- read_dta("firm_data.dta")

In all three cases above, what we have done is to create a new data.frame (a data.frame is a type of object that we’ll talk about in detail later on in this chapter) called firm_data that contains the data that we were trying to load.