RStudio Tour

Simple calculations

1+1
answer <- 1+1
answer
answer + 3

Installing packages

base R vs. loading external packages

lubridate, this is a package for working with dates

side-comment: dates are sort of tricky because they are not really “strings”, but they are not really numbers either (that said, we can think of numeric things like how many days it has been since 1900); they can also be stored in all types of formats

install.packages("lubridate")
library(lubridate)
today <- "January 18, 2022"
class(today)
today_date <- lubridate::mdy(today)
today_date
class(today_date)
today_date + 1 # adds a day

Vectors

vec <- c(1,2,3,4,5)
class(vec)
vec
class(1)
vec + 5
vec2 <- c(3,4,9,6,7)
vec+vec2
seq(2, 20, by=2)
sort(vec)
order(vec)
rev(vec)
vec %in% c(5,9)
vec3 <- vec %in% c(5,9)
vec3
any(vec3)
all(vec3)

vec == 3
vec != 3
vec < 3
vec >= 3

Data Frames

read.csv
load
haven::read_dta
firm_data <- read.csv("firm_data.csv")

Accessor, basic functions, mean, log, length

firm_data$employees

mean(firm_data$employees)

log(firm_data$employees)

firm_data[4,] # access by index, like a matrix

# other useful functions
nrow
head
ncol
colnames
rownames
subset(firm_data, industry=="Manufacturing")

Lists

Lists are very generic in the sense that they can carry around complicated data. If you are familiar with any object oriented programming language like Java or C++, they have the flavor of an “object”, in the object-oriented sense.

vec <- c(1,2,3,4,5)
unusual_list <- list(numbers=vec, df=firm_data)

You can access the elements of a list in a few different ways. Sometimes it is convenient to access them via the $

unusual_list$numbers
## [1] 1 2 3 4 5

Other times, it is convenient to access them via their position in the list

unusual_list[[2]] # notice the double brackets
##   X                 name      industry county employees
## 1 1    ABC Manufacturing Manufacturing Clarke       531
## 2 2     Martin's Muffins Food Services Oconee         6
## 3 3 Down Home Appliances Manufacturing Clarke        15
## 4 4 Classic City Widgets Manufacturing Clarke       211
## 5 5   Watkinsville Diner Food Services Oconee        25

Matrices

Matrices are very similar to data frames, but the data should all be of the same type. These are useful for a number of the calculations that we will do this semester.

A <- matrix(c(1,2,3,4), nrow=2, byrow=TRUE)
A
##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4
B <- matrix(c(5,6,7,8),nrow=2,byrow=2)

You can access elements of a matrix by their position in the matrix, just like for the data frame above.

# first row, second column
A[1,2]
## [1] 2
# all rows in second column
A[,2] 
## [1] 2 4
A%*%B # matrix multiplication
##      [,1] [,2]
## [1,]   19   22
## [2,]   43   50
A*c(1,-1)
##      [,1] [,2]
## [1,]    1    2
## [2,]   -3   -4

cbind, rbind, as.matrix, solve, t, diag, rowsum

Writing functions

if/else

Let’s write a function that takes in the number of employees that are in a firm and prints “large” if the firm has more than 100 employees and “small” otherwise.

large_or_small <- function(employees) {
  if (employees > 100) {
    print("large")
  } else {
    print("small")
  }
}

I think, at this point, this code should make sense to you. The only new thing is the if/else. The following is not code that will actually run but is just to help understand the logic of if/else.

if (condition) {
  # do something
} else {
  # do something else
}

All that happens with if/else is that we check whether condition evaluate to TRUE or FALSE. If it is TRUE, the code will do whatever is inside the first set of brackets; if it is FALSE, the code will do whatever is in the set of brackets following else.

For loops

Often, we need to run the same code over and over again. A for loop is a main programming tool for this case (for loops show up in pretty much all programming languages).

out <- c()
for (i in 1:10) {
  out[i] <- i*3
}
out
##  [1]  3  6  9 12 15 18 21 24 27 30

The above code, starts with \(i=1\), calculates \(i*3\) (which is 3), and then stores that result in the first element of the vector out, then \(i\) increases to 2, the code calculates \(i*3\) (which is now 6), and stores this result in the second element of out, and so on through \(i=10\).

Vectorization

Vectorizing functions is a relatively advanced topic in R programming, but it is an important one, so I am including it here.

Because we will often be working with data, we will often be performing the same operation on all of the observations in the data. For example, suppose that you wanted to take the logarithm of the number of employees for all the firms in firm_data. One way to do this is to use a for loop, but this code would be a bit of a mess. Instead, the function log is vectorized — this means that if we apply it to a vector, it will calculate the logarithm of each element in the vector. Besides this, vectorized functions are often faster than for loops.

Not all functions are vectorized though. Let’s go back to our function earlier called large_or_small. This took in the number of employees at a firm and then printed “large” if the firm had more than 100 employees and “small” otherwise. Let’s see what happens if we call this function on a vector of employees (Ideally, we’d like the function to be applied to each element in the vector).

employees <- firm_data$employees
employees
## [1] 531   6  15 211  25
large_or_small(employees)
## Error in if (employees > 100) {: the condition has length > 1

This is not what we wanted to have happen. Instead of determining whether each firm was large or small, we get an error basically said that something may be going wrong here. What’s going on here is that the function large_or_small is not vectorized.

In order to vectorize a function, we can use one of a number of “apply” functions in R. I’ll list them here

Let’s use sapply to vectorize large_or_small.

large_or_small_vectorized <- function(employees_vec) {
  sapply(employees_vec, FUN = large_or_small)
}

All that this will do is call the function large_or_small for each element in the vector employees. Let’s see it in action

large_or_small_vectorized(employees)
## [1] "large"
## [1] "small"
## [1] "small"
## [1] "large"
## [1] "small"
## [1] "large" "small" "small" "large" "small"

This is what we were hoping for.

Earlier we wrote a function to take a vector of numbers from 1 to 10 and multiply all of them by 3. Here’s how you could do this using sapply

sapply(1:10, function(i) i*3)
##  [1]  3  6  9 12 15 18 21 24 27 30

which is considerably shorter.

One last thing worth pointing out though is that multiplication is already vectorized, so you don’t actually need to do sapply or the for loop; a better way is just

(1:10)*3
##  [1]  3  6  9 12 15 18 21 24 27 30
large_or_small_vectorized2 <- function(employees_vec) {
  ifelse(employees_vec > 100, "large", "small")
}
large_or_small_vectorized2(firm_data$employees)
## [1] "large" "small" "small" "large" "small"

Here you can see that ifelse makes every comparison in its first argument, and then returns the second element for every TRUE coming from the first argument, and returns the third element for every FALSE coming from the first argument.

ifelse also works with vectors in the second and third element. For example:

  ifelse(c(1,3,5) < 4, yes=c(1,2,3), no=c(4,5,6))
## [1] 1 2 6

which picks up 1 and 2 from the second (yes) argument and 6 from the third (no) argument.

Tidyverse

Related Reading: IDS Chapter 4 — strongly recommend that you read this

Data Visualization

Related Reading: IDS Ch. 7-12 — R has very good data visualization tools; strongly recommend that you read this

Reproducible Research

Related Reading: IDS Ch. 41

Technical Writing Tools

A lot of mathematical/academic writing is done in Latex. Latex is a markup language — basically you write “marked up” text that is processed into a nice looking document. For example \textbf{bold text} becomes bold text or

\begin{align*}
  \hat{\beta} = (X'X)^{-1} X'Y
\end{align*}

becomes

\[\begin{align*} \hat{\beta} = (X'X)^{-1} X'Y \end{align*}\]

An easy way to get started here is to use the website Overleaf. This is also closely related to markdown/R-markdown discussed above (Latex tends to be somewhat more complicate which comes with some associated advantages and disadvantages).