There are a number of useful resources for R programming. I pointed out quite a few in the course syllabus and in the introduction to econ 8080 slides. The notes for today’s class mainly come from Introduction to Data Science by Rafael Irizarry. I’ll cover some introductory topics that I think are most useful.
Most Important Readings: Chapters 1 (Introduction), 2 (R Basics), 3 (Programming Basics), and 5 (Importing Data)
Secondary Readings: (please read as you have time) Chapters 4 (The tidyverse), 7 (Introduction to data visualization), 8 (ggplot2)
The remaining chapters below are just in case you are particularly interested in some topic (these are likely more than you need to know for our course):
I think you can safely ignore all other chapters.
I’m not sure if it is helpful or not, but here are the notes to myself that I used to teach our two review sessions on R.
AER
— package containing data from Applied Econometrics with R
wooldridge
— package containing data from Wooldridge’s text book
ggplot2
— package to produce sophisticated looking plots
dplyr
— package containing tools to manipulate data
haven
— package for loading different types of data files
plm
— package for working with panel data
fixest
— another package for working with panel data
ivreg
— package for IV regressions, diagnostics, etc.
estimatr
— package that runs regressions but with standard errors that economists often like more than the default options in R
modelsummary
— package for producing nice output of more than one regression and summary statistics
If, for some reason this doesn’t work, you can use the following code to reproduce this data
firm_data <- data.frame(name=c("ABC Manufacturing", "Martin\'s Muffins", "Down Home Appliances", "Classic City Widgets", "Watkinsville Diner"),
industry=c("Manufacturing", "Food Services", "Manufacturing", "Manufacturing", "Food Services"),
county=c("Clarke", "Oconee", "Clarke", "Clarke", "Oconee"),
employees=c(531, 6, 15, 211, 25))
Note: We’ll try to do these on our own, but if you get stuck, the solutons are here
Create two vectors as follows
x <- seq(2,10,by=2)
y <- c(3,5,7,11,13)
Add x
and y
, subtract y
from x
, multiply x
and y
, and divide x
by y
and report your results.
The geometric mean of a set of numbers is an alternative measure of central tendency to the more common “arithmetic mean” (this is the mean that we are used to). For a set of \(J\) numbers, \(x_1,x_2,\ldots,x_J\), the geometric mean is defined as
\[ (x_1 \cdot x_2 \cdot \cdots \cdot x_J)^{1/J} \]
Write a function called geometric_mean
that takes in a vector of numbers and computes their geometric mean. Compute the geometric mean of c(10,8,13)
Use the lubridate
package to figure out how many days there were between Jan. 1, 1981 and Jan. 10, 2022.
mtcars
is one of the data frames that comes packaged with base R.
How many observations does mtcars
have?
How many columns does mtcars
have?
What are the names of the columns of mtcars
?
Print only the rows of mtcars
for cars that get at least 20 mpg
Print only the rows of mtcars
that get at least 20 mpg and have at least 100 horsepower (it is in the column called hp
)
Print only the rows of mtcars
that have 6 or more cylinders (it is in the column labeld cyl
) or at least 100 horsepower
Recover the 10th row of mtcars
Sort the rows of mtcars
by mpg (from highest to lowest)