## 2.5 Data types

Related Reading: IDS 2.4

### 2.5.1 Numeric Vectors

The most basic data type in `R`

is the vector. In fact, above when we created variables that were just a single number, they are actually stored as a numeric vector.

To more explicitly create a vector, you can use the `c`

function in `R`

. For example, let’s create a vector called `five`

that contains the numbers 1 through 5.

`<- c(1,2,3,4,5) five `

We can print the contents of the vector `five`

just by typing its name

```
five#> [1] 1 2 3 4 5
```

Another common operation on vectors is to get a particular element of a vector. Let me give an example

```
3]
five[#> [1] 3
```

This code takes the vector `five`

and returns the third element in the vector. Notice that the above line contains braces, `[`

and `]`

rather than parentheses.

If you want several different elements from a vector, you can do the following

```
c(1,4)]
five[#> [1] 1 4
```

This code takes the vector `five`

and returns the first and fourth element in the vector.

One more useful function for vectors is the function `length`

. This tells you the number of elements in vector. For example,

```
length(five)
#> [1] 5
```

which means that there are five total elements in the vector `five`

.

### 2.5.2 Vector arithmetic

Related Reading: IDS 2.11

The main operations on numeric vectors are `+`

, `-`

, `*`

, `/`

which correspond to addition, subtraction, multiplication, and division. Often, we would like to carry out these operations on vectors.

There are two main cases. The first case is when you try to add a single number (i.e., a scalar) to all the elements in a vector. In this setup, the operation will happen element-wise which means the same number will be added to all numbers in the vector. This will be clear with some examples.

```
<- c(1,2,3,4,5)
five
# adds one to each element in vector
+ 1
five #> [1] 2 3 4 5 6
# also adds one to each element in vector
1 + five
#> [1] 2 3 4 5 6
```

Similar things will happen with the other mathematical operations above. Here are some more examples:

```
* 3
five #> [1] 3 6 9 12 15
- 3
five #> [1] -2 -1 0 1 2
/ 3
five #> [1] 0.3333333 0.6666667 1.0000000 1.3333333 1.6666667
```

The other interesting case is what happens when you try to apply any of the same mathematical operators to two different vectors.

```
# just some random numbers
<- c(8,-3,4,1,7)
vec2
+ vec2
five #> [1] 9 -1 7 5 12
- vec2
five #> [1] -7 5 -1 3 -2
* vec2
five #> [1] 8 -6 12 4 35
/ vec2
five #> [1] 0.1250000 -0.6666667 0.7500000 4.0000000 0.7142857
```

You can immediately see what happens here. For example, for `five + vec2`

, the first element of `five`

is added to the first element of `vec2`

, the second element of `five`

is added to the second element of `vec2`

and so on. Similar things happen for each of the other mathematical operations too.

There’s one other case that might be interesting to consider too. What happens if you try to apply these mathematical operations to two vectors of different lengths? Let’s find out

```
<- c(2,6)
vec3 + vec3
five #> Warning in five + vec3: longer object length is not a
#> multiple of shorter object length
#> [1] 3 8 5 10 7
```

You’ll notice that this computes *something* but it also issues a warning. What happens here is that the result is equal to the first element of `five`

plus the first element of `vec3`

, the second of `five`

plus the second element of `vec3`

, the third element of `five`

plus *the first element of vec3*, the fourth element of

`five`

plus *the second element of*, and the fifth element of

`vec3`

`five`

plus *the first element of*. What’s happening here is that, since

`vec3`

`vec3`

contains fewere elements that `five`

, the elements of `vec3`

are getting *recycled*. In my experience, this warning often indicates a coding mistake. There are many cases where I want to add the same number to all elements in a vector, and many other cases where I want to add two vectors that have the same length, but I cannot think of any cases where I would want to add two vectors the way that is being carried out here.

The same sort of things will happen with subtraction, multiplication, and division (feel free to try it out).

### 2.5.3 More helpful functions in R

This is definitely an incomplete list, but I’ll point you here to some more functions in R that are often helpful along with quick examples of them.

`seq`

function — creates a “sequence” of numbers`seq(2,7) #> [1] 2 3 4 5 6 7`

`sum`

function — computes the sum of a vector of numbers`sum(c(1,5,8)) #> [1] 14`

`sort`

,`order`

, and`rev`

functions — functions for understanding the order or changing the order of a vector`sort(c(3,1,5)) #> [1] 1 3 5 order(c(3,1,5)) #> [1] 2 1 3 rev(c(3,1,5)) #> [1] 5 1 3`

`%%`

— modulo function (i.e., returns the remainder from dividing one number by another)`8 %% 3 #> [1] 2 1 %% 3 #> [1] 1`

Practice: The function `seq`

contains an optional argument `length.out`

. Try running the following code and seeing if you can figure out what `length.out`

does.

```
seq(1,10,length.out=5)
seq(1,10,length.out=10)
seq(1.10,length.out=20)
```

### 2.5.4 Other types of vectors

There are other types of vectors in R too. Probably the main two other types of vectors are **character vectors** and **logical vectors**. We’ll talk about character vectors here and defer logical vectors until later. Character vectors are often referred to as **strings**.

We can create a character vector as follows

```
<- "econometrics"
string1 <- "class"
string2
string1#> [1] "econometrics"
```

The above code creates two character vectors and then prints the first one.

### 2.5.5 Data Frames

Another very important type of object in R is the **data frame**. I think it is helpful to think of a data frame as being very similar to an Excel spreadsheet — sort of like a matrix or a two-dimensional array. Each row typically corresponds to a particular observation, and each column typically provides the value of a particular variable for that observation.

Just to give a simple example, suppose that we had firm-level data about the name of the firm, what industry a firm was in, what county they were located in, and their number of employees. I created a data frame like this (it is totally made up, BTW) and show it to you next

` firm_data`

name | industry | county | employees |
---|---|---|---|

ABC Manufacturing | Manufacturing | Clarke | 531 |

Martin’s Muffins | Food Services | Oconee | 6 |

Down Home Appliances | Manufacturing | Clarke | 15 |

Classic City Widgets | Manufacturing | Clarke | 211 |

Watkinsville Diner | Food Services | Oconee | 25 |

Side Comment: If you are following along on R, I created this data frame using the following code

```
<- data.frame(name=c("ABC Manufacturing", "Martin\'s Muffins", "Down Home Appliances", "Classic City Widgets", "Watkinsville Diner"),
firm_data industry=c("Manufacturing", "Food Services", "Manufacturing", "Manufacturing", "Food Services"),
county=c("Clarke", "Oconee", "Clarke", "Clarke", "Oconee"),
employees=c(531, 6, 15, 211, 25))
```

This is also the same data that we loaded earlier in Section 2.3.

Often, we’ll like to access a particular column in a data frame. For example, you might want to calculate the average number of employees across all the firms in our data.

Typically, the easiest way to do this, is to use the **accessor** symbol, which is `$`

in R. This will make more sense with an example:

```
$employees
firm_data#> [1] 531 6 15 211 25
```

`firm_data$employees`

just provides the column called “employees” in the data frame called “firm_data”. You can also notice that `firm_data$employees`

is just a numeric vector. This means that you can apply any of the functions that we have been covering on it

```
mean(firm_data$employees)
#> [1] 157.6
log(firm_data$employees)
#> [1] 6.274762 1.791759 2.708050 5.351858 3.218876
```

Side Comment: Notice that the function `mean`

and `log`

behave differently. `mean`

calculates the average over all the elements in the vector `firm_data$employees`

and therefore returns a single number. `log`

calculates the logarithm of each element in the vector `firm_data$employees`

and therefore returns a numeric vector with five elements.

Side Comment:

The `$`

is not the only way to access the elements in a data frame. You can also access them by their position. For example, if you want whatever is in the third row and second column of the data frame, you can get it by

```
3,2]
firm_data[#> [1] "Manufacturing"
```

Sometimes it is also convenient to recover a particular row or column by its position in the data frame. Here is an example of recovering the entire fourth row

```
4,]
firm_data[#> name industry county employees
#> 4 Classic City Widgets Manufacturing Clarke 211
```

Notice that you just leave the “column index” (which is the second one) blank

Side Comment: One other thing that sometimes takes some getting used to is that, for programming in general, you have to be very precise. Suppose you were to make a very small typo. R is not going to understand what you mean. See if you can spot the typo in the next line of code.

```
$employes
firm_data#> NULL
```

A few more useful functions for working with data frames are:

`nrow`

and`ncol`

— returns the number of rows or columns in the data frame`colnames`

and`rownames`

— returns the names of the columns or rows

### 2.5.6 Lists

Vectors and data frames are the main two types of objects that we’ll use this semester, but let me give you a quick overview of a few other types of objects. Let’s start with **lists**. Lists are very generic in the sense that they can carry around complicated data. If you are familiar with any object oriented programming language like Java or C++, they have the flavor of an “object”, in the object-oriented sense.

I’m not sure if we will see any examples this semester where you *have* to use a list. But here is an example. Suppose that we wanted to put the vector that we created earlier `five`

and the data frame that we created earlier `firm_data`

into the same object. We could do it as follows

`<- list(numbers=five, df=firm_data) unusual_list `

You can access the elements of a list in a few different ways. Sometimes it is convenient to access them via the `$`

```
$numbers
unusual_list#> [1] 1 2 3 4 5
```

Other times, it is convenient to access them via their position in the list

```
2]] # notice the double brackets
unusual_list[[#> name industry county employees
#> 1 ABC Manufacturing Manufacturing Clarke 531
#> 2 Martin's Muffins Food Services Oconee 6
#> 3 Down Home Appliances Manufacturing Clarke 15
#> 4 Classic City Widgets Manufacturing Clarke 211
#> 5 Watkinsville Diner Food Services Oconee 25
```

### 2.5.7 Matrices

Matrices are very similar to data frames, but the data should all be of the same type. Matrices are very useful in some numerical calculations that are beyond the scope of this class. Here is an example of a matrix.

```
<- matrix(c(1,2,3,4), nrow=2, byrow=TRUE)
mat
mat#> [,1] [,2]
#> [1,] 1 2
#> [2,] 3 4
```

You can access elements of a matrix by their position in the matrix, just like for the data frame above.

```
# first row, second column
1,2]
mat[#> [1] 2
# all rows in second column
2]
mat[,#> [1] 2 4
```

### 2.5.8 Factors

Sometimes variables in economics are **categorical**. This sort of variable is somewhat between a numeric variable and a string. In `R`

, categorical variables are called **factors**.

A good example of a categorical variable is `firm_data$industry`

. It tells you the “category” of the industry that a firm is in.

Oftentimes, we may have to tell R that a variable is a “factor” rather than just a string. Let’s create a variable called `industry`

that contains the industry from `firm_data`

but as a factor.

```
<- as.factor(firm_data$industry)
industry
industry#> [1] Manufacturing Food Services Manufacturing Manufacturing
#> [5] Food Services
#> Levels: Food Services Manufacturing
```

A useful package for working with factor variables is the `forcats`

package.

### 2.5.9 Understanding an object in R

Sometimes you may be in the case where there is a variable where you don’t know what exactly it contains. Some functions that are helpful in this case are

`class`

— tells you, err, the class of an object (i.e., its “type”)`head`

— shows you the “beginning” of an object; this is especially helpful for large objects (like some data frames)`str`

— stands for “structure” of an object

Let’s try these out

```
class(firm_data)
#> [1] "data.frame"
# typically would show the first five rows of a data frame,
# but that is the whole data frame here
head(firm_data)
#> name industry county employees
#> 1 ABC Manufacturing Manufacturing Clarke 531
#> 2 Martin's Muffins Food Services Oconee 6
#> 3 Down Home Appliances Manufacturing Clarke 15
#> 4 Classic City Widgets Manufacturing Clarke 211
#> 5 Watkinsville Diner Food Services Oconee 25
str(firm_data)
#> 'data.frame': 5 obs. of 4 variables:
#> $ name : chr "ABC Manufacturing" "Martin's Muffins" "Down Home Appliances" "Classic City Widgets" ...
#> $ industry : chr "Manufacturing" "Food Services" "Manufacturing" "Manufacturing" ...
#> $ county : chr "Clarke" "Oconee" "Clarke" "Clarke" ...
#> $ employees: num 531 6 15 211 25
```

Practice: Try running `class`

, `head`

, and `str`

on the vector `five`

that we created earlier.

Side Comment

`c`

stands for “concatenate”. Concatenate is a computer science word that means to combine two vectors. Probably the most well known version of this is “string concatenation” that combines two vectors of characters. Here is an example of string concatenation.Sometimes string concatenation means to put two (or more strings) into the same string. This can be done using the

`paste`

command in R.Notice that

`paste`

puts in a space between`string1`

and`string2`

. For practice, see if you can find an argument to the`paste`

function that allows you to remove the space between the two strings.