There are a number of useful resources for R programming. I pointed out quite a few in the course syllabus and in the introduction to econ 8070 slides. The notes for today’s class mainly come from Introduction to Data Science by Rafael Irizarry. I’ll cover some introductory topics that I think are most useful.
Most Important Readings: Chapters 1 (Introduction), 2 (R Basics), 3 (Programming Basics), and 5 (Importing Data)
Secondary Readings: (please read as you have time) Chapters 4 (The tidyverse), 7 (Introduction to data visualization), 8 (ggplot2)
The remaining chapters below are just in case you are particularly interested in some topic (these are likely more than you need to know for our course):
I think you can safely ignore all other chapters.
I’m not sure if it is helpful or not, but here are the notes to myself that I used to teach our two review sessions on R.
AER
— package containing data from Applied
Econometrics with R
wooldridge
— package containing data from
Wooldridge’s text book
ggplot2
— package to produce sophisticated looking
plots
dplyr
— package containing tools to manipulate
data
haven
— package for loading different types of data
files
plm
— package for working with panel data
fixest
— another package for working with panel
data
ivreg
— package for IV regressions, diagnostics,
etc.
estimatr
— package that runs regressions but with
standard errors that economists often like more than the default options
in R
modelsummary
— package for producing nice output of
more than one regression and summary statistics
If, for some reason this doesn’t work, you can use the following code to reproduce this data
firm_data <- data.frame(name=c("ABC Manufacturing", "Martin\'s Muffins", "Down Home Appliances", "Classic City Widgets", "Watkinsville Diner"),
industry=c("Manufacturing", "Food Services", "Manufacturing", "Manufacturing", "Food Services"),
county=c("Clarke", "Oconee", "Clarke", "Clarke", "Oconee"),
employees=c(531, 6, 15, 211, 25))
Note: We’ll try to do these on our own, but if you get stuck, the solutons are here
Create two vectors as follows
x <- seq(2,10,by=2)
y <- c(3,5,7,11,13)
Add x
and y
, subtract y
from
x
, multiply x
and y
, and divide
x
by y
and report your results.
The geometric mean of a set of numbers is an alternative measure of central tendency to the more common “arithmetic mean” (this is the mean that we are used to). For a set of \(J\) numbers, \(x_1,x_2,\ldots,x_J\), the geometric mean is defined as
\[ (x_1 \cdot x_2 \cdot \cdots \cdot x_J)^{1/J} \]
Write a function called geometric_mean
that takes in a
vector of numbers and computes their geometric mean. Compute the
geometric mean of c(10,8,13)
Use the lubridate
package to figure out how many
days there were between Jan. 1, 1981 and Jan. 10, 2022.
mtcars
is one of the data frames that comes packaged
with base R.
How many observations does mtcars
have?
How many columns does mtcars
have?
What are the names of the columns of
mtcars
?
Print only the rows of mtcars
for cars that get at
least 20 mpg
Print only the rows of mtcars
that get at least 20
mpg and have at least 100 horsepower (it is in the column called
hp
)
Print only the rows of mtcars
that have 6 or more
cylinders (it is in the column labeld cyl
) or at least 100
horsepower
Recover the 10th row of mtcars
Sort the rows of mtcars
by mpg (from highest to
lowest)