- Source pane - where write/save code
- Console - where code executes
- Environment pane - holds variables/functions in memory
- File pane - show how to set the working directory

```
1+1
<- 1+1
answer
answer+ 3 answer
```

base R vs. loading external packages

lubridate, this is a package for working with dates

side-comment: dates are sort of tricky because they are not really “strings”, but they are not really numbers either (that said, we can think of numeric things like how many days it has been since 1900); they can also be stored in all types of formats

```
install.packages("lubridate")
library(lubridate)
<- "January 18, 2022"
today class(today)
<- lubridate::mdy(today)
today_date
today_dateclass(today_date)
+ 1 # adds a day today_date
```

- R’s primitive data type is a vector,
- the one we’ll use the most is a numeric vector,
- let’s create a numeric vector,

```
<- c(1,2,3,4,5)
vec class(vec)
vecclass(1)
```

you can do all kinds of numeric operations here

both relative to a constant

```
+ 5
vec <- c(3,4,9,6,7)
vec2 +vec2 vec
```

- other useful functions for vectors

```
seq(2, 20, by=2)
sort(vec)
order(vec)
rev(vec)
```

- other common types of vectors: character, logical (i.e., TRUE or FALSE), and factor

```
%in% c(5,9)
vec <- vec %in% c(5,9)
vec3
vec3any(vec3)
all(vec3)
== 3
vec != 3
vec < 3
vec >= 3 vec
```

practice loading data, just click or use functions

basically a matrix, but can store complicated data types

columns have names

```
read.csv
load::read_dta haven
```

`<- read.csv("firm_data.csv") firm_data `

```
$employees
firm_data
mean(firm_data$employees)
log(firm_data$employees)
4,] # access by index, like a matrix
firm_data[
# other useful functions
nrow
head
ncol
colnames
rownamessubset(firm_data, industry=="Manufacturing")
```

Many functions in R have default arguments, like the base in

`log`

`?`

function

Lists are very generic in the sense that they can carry around complicated data. If you are familiar with any object oriented programming language like Java or C++, they have the flavor of an “object”, in the object-oriented sense.

```
<- c(1,2,3,4,5)
vec <- list(numbers=vec, df=firm_data) unusual_list
```

You can access the elements of a list in a few different ways. Sometimes it is convenient to access them via the `$`

`$numbers unusual_list`

`## [1] 1 2 3 4 5`

Other times, it is convenient to access them via their position in the list

`2]] # notice the double brackets unusual_list[[`

```
## X name industry county employees
## 1 1 ABC Manufacturing Manufacturing Clarke 531
## 2 2 Martin's Muffins Food Services Oconee 6
## 3 3 Down Home Appliances Manufacturing Clarke 15
## 4 4 Classic City Widgets Manufacturing Clarke 211
## 5 5 Watkinsville Diner Food Services Oconee 25
```

Matrices are very similar to data frames, but the data should all be of the same type. These are useful for a number of the calculations that we will do this semester.

```
<- matrix(c(1,2,3,4), nrow=2, byrow=TRUE)
A A
```

```
## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
```

`<- matrix(c(5,6,7,8),nrow=2,byrow=2) B `

You can access elements of a matrix by their position in the matrix, just like for the data frame above.

```
# first row, second column
1,2] A[
```

`## [1] 2`

```
# all rows in second column
2] A[,
```

`## [1] 2 4`

- matrix multiplication, element-wise multiplication

`%*%B # matrix multiplication A`

```
## [,1] [,2]
## [1,] 19 22
## [2,] 43 50
```

`*c(1,-1) A`

```
## [,1] [,2]
## [1,] 1 2
## [2,] -3 -4
```

- some other useful matrix functions

cbind, rbind, as.matrix, solve, t, diag, rowsum

write a function that takes in a vector and returns the second smallest element

write a function that takes in a vector and returns the nth smallest element

write a function that takes in a vector and returns the nth smallest element, default is n=2

Let’s write a function that takes in the number of employees that are in a firm and prints “large” if the firm has more than 100 employees and “small” otherwise.

```
<- function(employees) {
large_or_small if (employees > 100) {
print("large")
else {
} print("small")
} }
```

I think, at this point, this code should make sense to you. The only new thing is the if/else. The following is not code that will actually run but is just to help understand the logic of if/else.

```
if (condition) {
# do something
else {
} # do something else
}
```

All that happens with if/else is that we check whether `condition`

evaluate to `TRUE`

or `FALSE`

. If it is `TRUE`

, the code will do whatever is inside the first set of brackets; if it is `FALSE`

, the code will do whatever is in the set of brackets following `else`

.

Often, we need to run the same code over and over again. A `for`

loop is a main programming tool for this case (`for`

loops show up in pretty much all programming languages).

```
<- c()
out for (i in 1:10) {
<- i*3
out[i]
} out
```

`## [1] 3 6 9 12 15 18 21 24 27 30`

The above code, starts with \(i=1\), calculates \(i*3\) (which is 3), and then stores that result in the first element of the vector `out`

, then \(i\) increases to 2, the code calculates \(i*3\) (which is now 6), and stores this result in the second element of `out`

, and so on through \(i=10\).

Vectorizing functions is a relatively advanced topic in R programming, but it is an important one, so I am including it here.

Because we will often be working with data, we will often be performing the same operation on all of the observations in the data. For example, suppose that you wanted to take the logarithm of the number of employees for all the firms in `firm_data`

. One way to do this is to use a `for`

loop, but this code would be a bit of a mess. Instead, the function `log`

is **vectorized** — this means that if we apply it to a vector, it will calculate the logarithm of each element in the vector. Besides this, vectorized functions are often faster than `for`

loops.

Not all functions are vectorized though. Let’s go back to our function earlier called `large_or_small`

. This took in the number of employees at a firm and then printed “large” if the firm had more than 100 employees and “small” otherwise. Let’s see what happens if we call this function on a vector of employees (Ideally, we’d like the function to be applied to each element in the vector).

```
<- firm_data$employees
employees employees
```

`## [1] 531 6 15 211 25`

`large_or_small(employees)`

```
## Warning in if (employees > 100) {: the condition has length > 1 and only the
## first element will be used
```

`## [1] "large"`

This is not what we wanted to have happen. It just reported whether or not the first firm was large or small, and then issued a warning that basically said that something may be going wrong here. What’s going on here is that the function `large_or_small`

is not vectorized.

In order to vectorize a function, we can use one of a number of “apply” functions in R. I’ll list them here

`sapply`

— this stands for “simplify” apply; it “applies” the function to all the elements in the vector or list that you pass in and then tries to “simplify” the result`lapply`

— stands for “list” apply; applies a function to all elements in a vector or list and then returns a list`vapply`

— stands for “vector” apply; applies a function to all elements in a vector or list and then returns a vector`apply`

— applies a function to either the rows or columns of a matrix-like object (i.e., a matrix or a data frame) depending on the value of the argument`MARGIN`

Let’s use `sapply`

to vectorize `large_or_small`

.

```
<- function(employees_vec) {
large_or_small_vectorized sapply(employees_vec, FUN = large_or_small)
}
```

All that this will do is call the function `large_or_small`

for each element in the vector `employees`

. Let’s see it in action

`large_or_small_vectorized(employees)`

```
## [1] "large"
## [1] "small"
## [1] "small"
## [1] "large"
## [1] "small"
```

`## [1] "large" "small" "small" "large" "small"`

This is what we were hoping for.

- I also typically replace most all
`for`

loops with an`apply`

function. In most cases, I don’t think there is much of a performance gain, but the code seems easier to read (or at least more concise).

Earlier we wrote a function to take a vector of numbers from 1 to 10 and multiply all of them by 3. Here’s how you could do this using `sapply`

`sapply(1:10, function(i) i*3)`

`## [1] 3 6 9 12 15 18 21 24 27 30`

which is considerably shorter.

One last thing worth pointing out though is that multiplication is already vectorized, so you don’t actually need to do `sapply`

or the `for`

loop; a better way is just

`1:10)*3 (`

`## [1] 3 6 9 12 15 18 21 24 27 30`

- It’s often helpful to have a vectorized version of if/else. In
`R`

, this is available in the function`ifelse`

. Here is an alternative way to vectorize the function`large_or_small`

:

```
<- function(employees_vec) {
large_or_small_vectorized2 ifelse(employees_vec > 100, "large", "small")
}large_or_small_vectorized2(firm_data$employees)
```

`## [1] "large" "small" "small" "large" "small"`

Here you can see that `ifelse`

makes every comparison in its first argument, and then returns the second element for every `TRUE`

coming from the first argument, and returns the third element for every `FALSE`

coming from the first argument.

`ifelse`

also works with vectors in the second and third element. For example:

`ifelse(c(1,3,5) < 4, yes=c(1,2,3), no=c(4,5,6)) `

`## [1] 1 2 6`

which picks up 1 and 2 from the second (`yes`

) argument and 6 from the third (`no`

) argument.

Related Reading: IDS Chapter 4 — strongly recommend that you read this

R has very good data cleaning / manipulating tools

Many of them are in the “tidyverse”

Mostly this semester, I’ll just give you a data set that is ready to be worked with. But as you move to doing your own research projects, you will realize that a major step in analyzing data is organizing (“cleaning”) the data in a way that you can analyze it

Main packages

`ggplot2`

– see below`dplyr`

— package to manipulate data`tidyr`

— more ways to manipulate data`readr`

— read in data`purrr`

— alternative versions of`apply`

functions and`for`

loops`tibble`

— alternative versions of`data.frame`

`stringr`

— tools for working with strings`forcats`

— tools for working with factors

I won’t emphasize these too much as they are somewhat advanced topics, but if you are interested, these are good (and marketable) skills to have

Related Reading: IDS Ch. 7-12 — `R`

has very good data visualization tools; strongly recommend that you read this

Another very strong point of

`R`

Base

`R`

comes with the`plot`

command, but the`ggplot2`

package provides cutting edge plotting tools. These tools will be somewhat harder to learn, but we’ll use`ggplot2`

this semester as I think it is worth it.538’s graphs produced with ggplot

Related Reading: IDS Ch. 41

Rmarkdown is a very useful way to mix code and content

These notes are written in Rmarkdown, and I usually write homework solutions in Rmarkdown

If you are interested, Github is a very useful version control tool (i.e., keeps track of the version of your project, useful for merging projects, and sharing or co-authoring code) and Dropbox (also useful for sharing code). I use both of these extensively — in general, I use Github relatively more for bigger projects and more public projects and Dropbox more for smaller projects and early versions of projects.

A lot of mathematical/academic writing is done in Latex. Latex is a markup language — basically you write “marked up” text that is processed into a nice looking document. For example `\textbf{bold text}`

becomes **bold text** or

```
\begin{align*}
\hat{\beta} = (X'X)^{-1} X'Y
\end{align*}
```

becomes

\[\begin{align*} \hat{\beta} = (X'X)^{-1} X'Y \end{align*}\]

An easy way to get started here is to use the website Overleaf. This is also closely related to markdown/R-markdown discussed above (Latex tends to be somewhat more complicate which comes with some associated advantages and disadvantages).