<- data.frame(name=c("ABC Manufacturing",
firm_data "Martin\'s Muffins",
"Down Home Appliances",
"Classic City Widgets",
"Watkinsville Diner"),
industry=c("Manufacturing",
"Food Services",
"Manufacturing",
"Manufacturing",
"Food Services"),
county=c("Clarke",
"Oconee",
"Clarke",
"Clarke",
"Oconee"),
employees=c(531, 6, 15, 211, 25))
Topic 0: Introduction to R Programming
Material Covered in Class
There are a number of useful resources for R programming. I pointed out quite a few in the course syllabus. The material for this section mainly comes from Introduction to Data Science: Data Wrangling and Visualization with R by Rafael Irizarry. I’ll cover some introductory topics that I think are most useful.
Additional Material
Most Important Readings: Chapters 1 (Introduction), 2 (R Basics), 3 (Programming Basics), 6 (Importing Data), 20 (Reproducible Research)
Secondary Readings: (please read as you have time) Chapters 4-5 (The tidyverse and data.table), 7-10 (Data Visualization)
The remaining chapters of this book are all useful, but you can read them over the course of the semester as you have time.
List of useful R packages
AER
— package containing data from Applied Econometrics with Rwooldridge
— package containing data from Wooldridge’s text bookggplot2
— package to produce sophisticated looking plotsdplyr
— package containing tools to manipulate datahaven
— package for loading different types of data filesplm
— package for working with panel datafixest
— another package for working with panel dataivreg
— package for IV regressions, diagnostics, etc.estimatr
— package that runs regressions but with standard errors that economists often like more than the default options inR
modelsummary
— package for producing nice output of more than one regression and summary statistics
Practice loading data
If, for some reason this doesn’t work, you can use the following code to reproduce this data
Practice Questions
Note: We’ll try to do these on our own, but if you get stuck, the solutions are here
Create two vectors as follows
<- seq(2,10,by=2) x <- c(3,5,7,11,13) y
Add
x
andy
, subtracty
fromx
, multiplyx
andy
, and dividex
byy
and report your results.The geometric mean of a set of numbers is an alternative measure of central tendency to the more common “arithmetic mean” (this is the mean that we are used to). For a set of \(J\) numbers, \(x_1,x_2,\ldots,x_J\), the geometric mean is defined as
\[ (x_1 \cdot x_2 \cdot \cdots \cdot x_J)^{1/J} \]
Write a function called
geometric_mean
that takes in a vector of numbers and computes their geometric mean. Compute the geometric mean ofc(10,8,13)
Use the
lubridate
package to figure out how many days there were between Jan. 1, 1981 and Jan. 10, 2022.mtcars
is one of the data frames that comes packaged with base R.How many observations does
mtcars
have?How many columns does
mtcars
have?What are the names of the columns of
mtcars
?Print only the rows of
mtcars
for cars that get at least 20 mpgPrint only the rows of
mtcars
that get at least 20 mpg and have at least 100 horsepower (it is in the column calledhp
)Print only the rows of
mtcars
that have 6 or more cylinders (it is in the column labeldcyl
) or at least 100 horsepowerRecover the 10th row of
mtcars
Sort the rows of
mtcars
by mpg (from highest to lowest)