2.5 Data types
Related Reading: IDS 2.4
2.5.1 Numeric Vectors
The most basic data type in
R is the vector. In fact, above when we created variables that were just a single number, they are actually stored as a numeric vector.
To more explicitly create a vector, you can use the
c function in
R. For example, let’s create a vector called
five that contains the numbers 1 through 5.
<- c(1,2,3,4,5) five
We can print the contents of the vector
five just by typing its name
five#>  1 2 3 4 5
Another common operation on vectors is to get a particular element of a vector. Let me give an example
3] five[#>  3
This code takes the vector
five and returns the third element in the vector. Notice that the above line contains braces,
] rather than parentheses.
If you want several different elements from a vector, you can do the following
c(1,4)] five[#>  1 4
This code takes the vector
five and returns the first and fourth element in the vector.
One more useful function for vectors is the function
length. This tells you the number of elements in vector. For example,
length(five) #>  5
which means that there are five total elements in the vector
2.5.2 Vector arithmetic
Related Reading: IDS 2.11
The main operations on numeric vectors are
/ which correspond to addition, subtraction, multiplication, and division. Often, we would like to carry out these operations on vectors.
There are two main cases. The first case is when you try to add a single number (i.e., a scalar) to all the elements in a vector. In this setup, the operation will happen element-wise which means the same number will be added to all numbers in the vector. This will be clear with some examples.
<- c(1,2,3,4,5) five # adds one to each element in vector + 1 five #>  2 3 4 5 6 # also adds one to each element in vector 1 + five #>  2 3 4 5 6
Similar things will happen with the other mathematical operations above. Here are some more examples:
* 3 five #>  3 6 9 12 15 - 3 five #>  -2 -1 0 1 2 / 3 five #>  0.3333333 0.6666667 1.0000000 1.3333333 1.6666667
The other interesting case is what happens when you try to apply any of the same mathematical operators to two different vectors.
# just some random numbers <- c(8,-3,4,1,7) vec2 + vec2 five #>  9 -1 7 5 12 - vec2 five #>  -7 5 -1 3 -2 * vec2 five #>  8 -6 12 4 35 / vec2 five #>  0.1250000 -0.6666667 0.7500000 4.0000000 0.7142857
You can immediately see what happens here. For example, for
five + vec2, the first element of
five is added to the first element of
vec2, the second element of
five is added to the second element of
vec2 and so on. Similar things happen for each of the other mathematical operations too.
There’s one other case that might be interesting to consider too. What happens if you try to apply these mathematical operations to two vectors of different lengths? Let’s find out
<- c(2,6) vec3 + vec3 five #> Warning in five + vec3: longer object length is not a #> multiple of shorter object length #>  3 8 5 10 7
You’ll notice that this computes something but it also issues a warning. What happens here is that the result is equal to the first element of
five plus the first element of
vec3, the second of
five plus the second element of
vec3, the third element of
five plus the first element of
vec3, the fourth element of
five plus the second element of
vec3, and the fifth element of
five plus the first element of
vec3. What’s happening here is that, since
vec3 contains fewere elements that
five, the elements of
vec3 are getting recycled. In my experience, this warning often indicates a coding mistake. There are many cases where I want to add the same number to all elements in a vector, and many other cases where I want to add two vectors that have the same length, but I cannot think of any cases where I would want to add two vectors the way that is being carried out here.
The same sort of things will happen with subtraction, multiplication, and division (feel free to try it out).
2.5.3 More helpful functions in R
This is definitely an incomplete list, but I’ll point you here to some more functions in R that are often helpful along with quick examples of them.
seqfunction — creates a “sequence” of numbers
seq(2,7) #>  2 3 4 5 6 7
sumfunction — computes the sum of a vector of numbers
sum(c(1,5,8)) #>  14
revfunctions — functions for understanding the order or changing the order of a vector
sort(c(3,1,5)) #>  1 3 5 order(c(3,1,5)) #>  2 1 3 rev(c(3,1,5)) #>  5 1 3
%%— modulo function (i.e., returns the remainder from dividing one number by another)
8 %% 3 #>  2 1 %% 3 #>  1
Practice: The function
seq contains an optional argument
length.out. Try running the following code and seeing if you can figure out what
seq(1,10,length.out=5) seq(1,10,length.out=10) seq(1.10,length.out=20)
2.5.4 Other types of vectors
There are other types of vectors in R too. Probably the main two other types of vectors are character vectors and logical vectors. We’ll talk about character vectors here and defer logical vectors until later. Character vectors are often referred to as strings.
We can create a character vector as follows
<- "econometrics" string1 <- "class" string2 string1#>  "econometrics"
The above code creates two character vectors and then prints the first one.
2.5.5 Data Frames
Another very important type of object in R is the data frame. I think it is helpful to think of a data frame as being very similar to an Excel spreadsheet — sort of like a matrix or a two-dimensional array. Each row typically corresponds to a particular observation, and each column typically provides the value of a particular variable for that observation.
Just to give a simple example, suppose that we had firm-level data about the name of the firm, what industry a firm was in, what county they were located in, and their number of employees. I created a data frame like this (it is totally made up, BTW) and show it to you next
|Martin’s Muffins||Food Services||Oconee||6|
|Down Home Appliances||Manufacturing||Clarke||15|
|Classic City Widgets||Manufacturing||Clarke||211|
|Watkinsville Diner||Food Services||Oconee||25|
Side Comment: If you are following along on R, I created this data frame using the following code
<- data.frame(name=c("ABC Manufacturing", "Martin\'s Muffins", "Down Home Appliances", "Classic City Widgets", "Watkinsville Diner"), firm_data industry=c("Manufacturing", "Food Services", "Manufacturing", "Manufacturing", "Food Services"), county=c("Clarke", "Oconee", "Clarke", "Clarke", "Oconee"), employees=c(531, 6, 15, 211, 25))
This is also the same data that we loaded earlier in Section 2.3.
Often, we’ll like to access a particular column in a data frame. For example, you might want to calculate the average number of employees across all the firms in our data.
Typically, the easiest way to do this, is to use the accessor symbol, which is
$ in R. This will make more sense with an example:
$employees firm_data#>  531 6 15 211 25
firm_data$employees just provides the column called “employees” in the data frame called “firm_data”. You can also notice that
firm_data$employees is just a numeric vector. This means that you can apply any of the functions that we have been covering on it
mean(firm_data$employees) #>  157.6 log(firm_data$employees) #>  6.274762 1.791759 2.708050 5.351858 3.218876
Side Comment: Notice that the function
log behave differently.
mean calculates the average over all the elements in the vector
firm_data$employees and therefore returns a single number.
log calculates the logarithm of each element in the vector
firm_data$employees and therefore returns a numeric vector with five elements.
$ is not the only way to access the elements in a data frame. You can also access them by their position. For example, if you want whatever is in the third row and second column of the data frame, you can get it by
3,2] firm_data[#>  "Manufacturing"
Sometimes it is also convenient to recover a particular row or column by its position in the data frame. Here is an example of recovering the entire fourth row
4,] firm_data[#> name industry county employees #> 4 Classic City Widgets Manufacturing Clarke 211
Notice that you just leave the “column index” (which is the second one) blank
Side Comment: One other thing that sometimes takes some getting used to is that, for programming in general, you have to be very precise. Suppose you were to make a very small typo. R is not going to understand what you mean. See if you can spot the typo in the next line of code.
$employes firm_data#> NULL
A few more useful functions for working with data frames are:
ncol— returns the number of rows or columns in the data frame
rownames— returns the names of the columns or rows
Vectors and data frames are the main two types of objects that we’ll use this semester, but let me give you a quick overview of a few other types of objects. Let’s start with lists. Lists are very generic in the sense that they can carry around complicated data. If you are familiar with any object oriented programming language like Java or C++, they have the flavor of an “object”, in the object-oriented sense.
I’m not sure if we will see any examples this semester where you have to use a list. But here is an example. Suppose that we wanted to put the vector that we created earlier
five and the data frame that we created earlier
firm_data into the same object. We could do it as follows
<- list(numbers=five, df=firm_data)unusual_list
You can access the elements of a list in a few different ways. Sometimes it is convenient to access them via the
$numbers unusual_list#>  1 2 3 4 5
Other times, it is convenient to access them via their position in the list
2]] # notice the double brackets unusual_list[[#> name industry county employees #> 1 ABC Manufacturing Manufacturing Clarke 531 #> 2 Martin's Muffins Food Services Oconee 6 #> 3 Down Home Appliances Manufacturing Clarke 15 #> 4 Classic City Widgets Manufacturing Clarke 211 #> 5 Watkinsville Diner Food Services Oconee 25
Matrices are very similar to data frames, but the data should all be of the same type. Matrices are very useful in some numerical calculations that are beyond the scope of this class. Here is an example of a matrix.
<- matrix(c(1,2,3,4), nrow=2, byrow=TRUE) mat mat#> [,1] [,2] #> [1,] 1 2 #> [2,] 3 4
You can access elements of a matrix by their position in the matrix, just like for the data frame above.
# first row, second column 1,2] mat[#>  2 # all rows in second column 2] mat[,#>  2 4
Sometimes variables in economics are categorical. This sort of variable is somewhat between a numeric variable and a string. In
R, categorical variables are called factors.
A good example of a categorical variable is
firm_data$industry. It tells you the “category” of the industry that a firm is in.
Oftentimes, we may have to tell R that a variable is a “factor” rather than just a string. Let’s create a variable called
industry that contains the industry from
firm_data but as a factor.
<- as.factor(firm_data$industry) industry industry#>  Manufacturing Food Services Manufacturing Manufacturing #>  Food Services #> Levels: Food Services Manufacturing
A useful package for working with factor variables is the
2.5.9 Understanding an object in R
Sometimes you may be in the case where there is a variable where you don’t know what exactly it contains. Some functions that are helpful in this case are
class— tells you, err, the class of an object (i.e., its “type”)
head— shows you the “beginning” of an object; this is especially helpful for large objects (like some data frames)
str— stands for “structure” of an object
Let’s try these out
class(firm_data) #>  "data.frame" # typically would show the first five rows of a data frame, # but that is the whole data frame here head(firm_data) #> name industry county employees #> 1 ABC Manufacturing Manufacturing Clarke 531 #> 2 Martin's Muffins Food Services Oconee 6 #> 3 Down Home Appliances Manufacturing Clarke 15 #> 4 Classic City Widgets Manufacturing Clarke 211 #> 5 Watkinsville Diner Food Services Oconee 25 str(firm_data) #> 'data.frame': 5 obs. of 4 variables: #> $ name : chr "ABC Manufacturing" "Martin's Muffins" "Down Home Appliances" "Classic City Widgets" ... #> $ industry : chr "Manufacturing" "Food Services" "Manufacturing" "Manufacturing" ... #> $ county : chr "Clarke" "Oconee" "Clarke" "Clarke" ... #> $ employees: num 531 6 15 211 25
Practice: Try running
str on the vector
five that we created earlier.
cstands for “concatenate”. Concatenate is a computer science word that means to combine two vectors. Probably the most well known version of this is “string concatenation” that combines two vectors of characters. Here is an example of string concatenation.
Sometimes string concatenation means to put two (or more strings) into the same string. This can be done using the
pastecommand in R.
pasteputs in a space between
string2. For practice, see if you can find an argument to the
pastefunction that allows you to remove the space between the two strings.