3 Programming in R

\[ \newcommand{\E}{\mathbb{E}} \renewcommand{\P}{\textrm{P}} \let\L\relax \newcommand{\L}{\textrm{L}} %doesn't work in .qmd, place this command at start of qmd file to use it \newcommand{\F}{\textrm{F}} \newcommand{\var}{\textrm{var}} \newcommand{\cov}{\textrm{cov}} \newcommand{\corr}{\textrm{corr}} \newcommand{\Var}{\mathrm{Var}} \newcommand{\Cov}{\mathrm{Cov}} \newcommand{\Corr}{\mathrm{Corr}} \newcommand{\sd}{\mathrm{sd}} \newcommand{\se}{\mathrm{s.e.}} \newcommand{\T}{T} \newcommand{\indicator}[1]{\mathbb{1}\{#1\}} \newcommand\independent{\perp \!\!\! \perp} \newcommand{\N}{\mathcal{N}} \]

3.1 Functions in R

3.2 Data types

3.2.1 Numeric Vectors

The most basic data type in R is the vector. In fact, above when we created variables that were just a single number, they are actually stored as a numeric vector.

To more explicitly create a vector, you can use the c function in R. For example, let’s create a vector called five that contains the numbers 1 through 5.

  five <- c(1,2,3,4,5)

We can print the contents of the vector five just by typing its name

five

[1] 1 2 3 4 5

Another common operation on vectors is to get a particular element of a vector. Let me give an example

five[3]

[1] 3

This code takes the vector five and returns the third element in the vector. Notice that the above line contains braces, [ and ] rather than parentheses.

If you want several different elements from a vector, you can do the following

five[c(1,4)]

[1] 1 4

This code takes the vector five and returns the first and fourth element in the vector.

One more useful function for vectors is the function length. This tells you the number of elements in vector. For example,

length(five)

[1] 5

which means that there are five total elements in the vector five.

3.2.2 Vector arithmetic

3.2.3 More helpful functions in R

This is definitely an incomplete list, but I’ll point you here to some more functions in R that are often helpful along with quick examples of them.

seq function — creates a “sequence” of numbers
```
seq(2,7)
```
```
[1] 2 3 4 5 6 7
```
sum function — computes the sum of a vector of numbers
```
sum(c(1,5,8))
```
```
[1] 14
```
sort, order, and rev functions — functions for understanding the order or changing the order of a vector
```
sort(c(3,1,5))
```
```
[1] 1 3 5
```
```
order(c(3,1,5))
```
```
[1] 2 1 3
```
```
rev(c(3,1,5))
```
```
[1] 5 1 3
```
%% — modulo function (i.e., returns the remainder from dividing one number by another)
```
8 %% 3
```
```
[1] 2
```
```
1 %% 3
```
```
[1] 1
```

Practice: The function seq contains an optional argument length.out. Try running the following code and seeing if you can figure out what length.out does.

seq(1,10,length.out=5)
seq(1,10,length.out=10)
seq(1.10,length.out=20)

3.2.4 Other types of vectors

There are other types of vectors in R too. Probably the main two other types of vectors are character vectors and logical vectors. We’ll talk about character vectors here and defer logical vectors until later. Character vectors are often referred to as strings.

We can create a character vector as follows

string1 <- "econometrics"
string2 <- "class"
string1

[1] "econometrics"

The above code creates two character vectors and then prints the first one.

Side Comment c stands for “concatenate”. Concatenate is a computer science word that means to combine two vectors. Probably the most well known version of this is “string concatenation” that combines two vectors of characters. Here is an example of string concatenation.

c(string1, string2)

[1] "econometrics" "class"

Sometimes string concatenation means to put two (or more strings) into the same string. This can be done using the paste command in R.

paste(string1, string2)

[1] "econometrics class"

Notice that paste puts in a space between string1 and string2. For practice, see if you can find an argument to the paste function that allows you to remove the space between the two strings.

3.2.5 Data Frames

Another very important type of object in R is the data frame. I think it is helpful to think of a data frame as being very similar to an Excel spreadsheet — sort of like a matrix or a two-dimensional array. Each row typically corresponds to a particular observation, and each column typically provides the value of a particular variable for that observation.

Just to give a simple example, suppose that we had firm-level data about the name of the firm, what industry a firm was in, what county they were located in, and their number of employees. I created a data frame like this (it is totally made up, BTW) and show it to you next

firm_data

name	industry	county	employees
ABC Manufacturing	Manufacturing	Clarke	531
Martin’s Muffins	Food Services	Oconee	6
Down Home Appliances	Manufacturing	Clarke	15
Classic City Widgets	Manufacturing	Clarke	211
Watkinsville Diner	Food Services	Oconee	25

Side Comment: If you are following along on R, I created this data frame using the following code

firm_data <- data.frame(name=c("ABC Manufacturing", "Martin\'s Muffins", "Down Home Appliances", "Classic City Widgets", "Watkinsville Diner"),
                        industry=c("Manufacturing", "Food Services", "Manufacturing", "Manufacturing", "Food Services"),
                        county=c("Clarke", "Oconee", "Clarke", "Clarke", "Oconee"),
                        employees=c(531, 6, 15, 211, 25))

This is also the same data that we loaded earlier in Section 2.3.

Often, we’ll like to access a particular column in a data frame. For example, you might want to calculate the average number of employees across all the firms in our data.

Typically, the easiest way to do this, is to use the accessor symbol, which is $ in R. This will make more sense with an example:

firm_data$employees

[1] 531   6  15 211  25

firm_data$employees just provides the column called “employees” in the data frame called “firm_data”. You can also notice that firm_data$employees is just a numeric vector. This means that you can apply any of the functions that we have been covering on it

mean(firm_data$employees)

[1] 157.6

log(firm_data$employees)

[1] 6.274762 1.791759 2.708050 5.351858 3.218876

Side Comment: Notice that the function mean and log behave differently. mean calculates the average over all the elements in the vector firm_data$employees and therefore returns a single number. log calculates the logarithm of each element in the vector firm_data$employees and therefore returns a numeric vector with five elements.

Side Comment:

The $ is not the only way to access the elements in a data frame. You can also access them by their position. For example, if you want whatever is in the third row and second column of the data frame, you can get it by

firm_data[3,2]

[1] "Manufacturing"

Sometimes it is also convenient to recover a particular row or column by its position in the data frame. Here is an example of recovering the entire fourth row

firm_data[4,]

                  name      industry county employees
4 Classic City Widgets Manufacturing Clarke       211

Notice that you just leave the “column index” (which is the second one) blank

Side Comment: One other thing that sometimes takes some getting used to is that, for programming in general, you have to be very precise. Suppose you were to make a very small typo. R is not going to understand what you mean. See if you can spot the typo in the next line of code.

firm_data$employes

NULL

A few more useful functions for working with data frames are:

nrow and ncol — returns the number of rows or columns in the data frame
colnames and rownames — returns the names of the columns or rows

3.2.6 Lists

Vectors and data frames are the main two types of objects that we’ll use this semester, but let me give you a quick overview of a few other types of objects. Let’s start with lists. Lists are very generic in the sense that they can carry around complicated data. If you are familiar with any object oriented programming language like Java or C++, they have the flavor of an “object”, in the object-oriented sense.

I’m not sure if we will see any examples this semester where you have to use a list. But here is an example. Suppose that we wanted to put the vector that we created earlier five and the data frame that we created earlier firm_data into the same object. We could do it as follows

unusual_list <- list(numbers=five, df=firm_data)

You can access the elements of a list in a few different ways. Sometimes it is convenient to access them via the $

unusual_list$numbers

[1] 1 2 3 4 5

Other times, it is convenient to access them via their position in the list

unusual_list[[2]] # notice the double brackets

                  name      industry county employees
1    ABC Manufacturing Manufacturing Clarke       531
2     Martin's Muffins Food Services Oconee         6
3 Down Home Appliances Manufacturing Clarke        15
4 Classic City Widgets Manufacturing Clarke       211
5   Watkinsville Diner Food Services Oconee        25

3.2.7 Matrices

Matrices are very similar to data frames, but the data should all be of the same type. Matrices are very useful in some numerical calculations that are beyond the scope of this class. Here is an example of a matrix.

mat <- matrix(c(1,2,3,4), nrow=2, byrow=TRUE)
mat

     [,1] [,2]
[1,]    1    2
[2,]    3    4

You can access elements of a matrix by their position in the matrix, just like for the data frame above.

# first row, second column
mat[1,2]

[1] 2

# all rows in second column
mat[,2]

[1] 2 4

3.2.8 Factors

Sometimes variables in economics are categorical. This sort of variable is somewhat between a numeric variable and a string. In R, categorical variables are called factors.

A good example of a categorical variable is firm_data$industry. It tells you the “category” of the industry that a firm is in.

Oftentimes, we may have to tell R that a variable is a “factor” rather than just a string. Let’s create a variable called industry that contains the industry from firm_data but as a factor.

industry <- as.factor(firm_data$industry)
industry

[1] Manufacturing Food Services Manufacturing Manufacturing Food Services
Levels: Food Services Manufacturing

A useful package for working with factor variables is the forcats package.

3.2.9 Understanding an object in R

Sometimes you may be in the case where there is a variable where you don’t know what exactly it contains. Some functions that are helpful in this case are

class — tells you, err, the class of an object (i.e., its “type”)
head — shows you the “beginning” of an object; this is especially helpful for large objects (like some data frames)
str — stands for “structure” of an object

Let’s try these out

class(firm_data)

[1] "data.frame"

# typically would show the first five rows of a data frame,
# but that is the whole data frame here
head(firm_data)

                  name      industry county employees
1    ABC Manufacturing Manufacturing Clarke       531
2     Martin's Muffins Food Services Oconee         6
3 Down Home Appliances Manufacturing Clarke        15
4 Classic City Widgets Manufacturing Clarke       211
5   Watkinsville Diner Food Services Oconee        25

str(firm_data)

'data.frame':   5 obs. of  4 variables:
 $ name     : chr  "ABC Manufacturing" "Martin's Muffins" "Down Home Appliances" "Classic City Widgets" ...
 $ industry : chr  "Manufacturing" "Food Services" "Manufacturing" "Manufacturing" ...
 $ county   : chr  "Clarke" "Oconee" "Clarke" "Clarke" ...
 $ employees: num  531 6 15 211 25

Practice: Try running class, head, and str on the vector five that we created earlier.

3.3 Logicals

3.3.1 Additional Logical Operators

3.4 Programming basics

3.4.1 Writing functions

3.4.2 if/else

3.4.3 for loops

3.4.4 Vectorization

3.5 Reproducible Research

3.6 Advanced Topics

To conclude this section, I want to briefly point you towards some advanced material. We will probably brush up against some of this material this semester. That being said, R has some very advanced capabilities related to data science, data manipulation, and data visualization. If you have time/interest you might push further in all of these directions. By the end of the semester, we may not have mastered these topics, but they should at least be accessible to you.

3.6.1 Tidyverse

Related Reading: IDS Chapter 4 — strongly recommend that you read this

• R has very good data cleaning / manipulating tools

Many of them are in the “tidyverse”
Mostly this semester, I’ll just give you a data set that is ready to be worked with. But as you move to your own research projects or do work for a company one day, you will realize that a major step in analyzing data is organizing (“cleaning”) the data in a way that you can analyze it

• Main packages

ggplot2 – see below
dplyr — package to manipulate data
tidyr — more ways to manipulate data
readr — read in data
purrr — alternative versions of apply functions and for loops
tibble — alternative versions of data.frame
stringr — tools for working with strings
forcats — tools for working with factors

• If you see code that uses the pipe operator %>%, it is tidyverse-style code. [You need to load a package to get access to the pipe function. I think this was introduced in the magrittr package, but you can also load it with the dplyr package, which is one of the main tidyverse packages.] This is unusual syntax for most programming languages, but it is (arguably) easier to read. Basically the pipe operator takes the result from one line of code and “pipes” it into the first argument of the next function. Here is an example

library(dplyr) # or library(magrittr)
firm_data %>%
  subset(employees > 100) %>%
  nrow()

[1] 2

What the above code does is it takes the data frame firm_data, subsets it to firms that have more than 100 rows, and calculates the number of rows in this subset (i.e., the number of large firms).

It is equivalent to the following, more traditional-looking code:

large_firms <- subset(firm_data, employees > 100)
nrow(large_firms)

[1] 2

• I won’t emphasize the tidyverse too much as I prefer (at least to some extent) writing code with a more traditional syntax. That said, tidyverse packages are really quite useful for data cleaning / wrangling. And, if you are interested, these are good (and marketable) skills to have.

3.6.2 Data Visualization

Related Reading: IDS Ch. 6-10 — R has very good data visualization tools. I strongly recommend that you read this.

Another very strong point of R
Base R comes with the plot command, but the ggplot2 package provides cutting edge plotting tools. These tools will be somewhat harder to learn, but we’ll use ggplot2 this semester as I think it is worth it.
You can produce professional quality plots in R that are publication ready

We will use ggplot2 this semester, but I will save a longer discussion for later.

3.6.3 Version Control

3.6.4 RStudio Projects

3.6.5 Technical Writing Tools

This is starting to get beyond the scope of the course, but, especially for students in ECON 6750, I recommend that you look up LaTeX. This is a markup language mainly for technical, academic writing. The big payoff is on writing mathematical equations. The equations in the Course Notes are written in LaTeX. For example, the LaTeX code for the solution to the quadratic equation written above is

$$
  x = \frac{-b \pm \sqrt{b^2-4ac}}{2a}
$$

where the $$ is a delimiter that tells LaTeX to render the text between the delimiters as an equation, \frac is a command that tells LaTeX to render the text as a fraction, and \pm is a command that tells LaTeX to render the text as a plus or minus sign.

As I mentioned above, the course notes are written in quarto, but it is possible to write entire documents in LaTeX. For example, all of my academic papers are written in pure LaTeX. An easy way to get started is to use the website Overleaf. This is a website that allows you to write LaTeX documents in your web browser. Writing homework solutions fully in LaTex would be overkill for this course, but (especially if you are thinking about doing a Ph.D. in economics), it would be a good thing to poke around with as you have time.

3.7 Lab 1: Introduction to R Programming

For this lab, we will do several practice problems related to programming in R.

Create two vectors as follows
```
x <- seq(2,10,by=2)
y <- c(3,5,7,11,13)
```
Add x and y, subtract y from x, multiply x and y, and divide x by y and report your results.
The geometric mean of a set of numbers is an alternative measure of central tendency to the more common “arithmetic mean” (this is the mean that we are used to). For a set of $J$ numbers, $x_1,x_2,\ldots,x_J$, the geometric mean is defined as

\[ (x_1 \cdot x_2 \cdot \cdots \cdot x_J)^{1/J} \]

Write a function called geometric_mean that takes in a vector of numbers and computes their geometric mean. Compute the geometric mean of c(10,8,13)
Use the lubridate package to figure out how many days elapsed between Jan. 1, 1981 and Jan. 10, 2022.
mtcars is one of the data frames that comes packaged with base R.
1. How many observations does mtcars have?
2. How many columns does mtcars have?
3. What are the names of the columns of mtcars?
4. Print only the rows of mtcars for cars that get at least 20 mpg
5. Print only the rows of mtcars that get at least 20 mpg and have at least 100 horsepower (it is in the column called hp)
6. Print only the rows of mtcars that have 6 or more cylinders (it is in the column labeld cyl) or at least 100 horsepower
7. Recover the 10th row of mtcars
8. Sort the rows of mtcars by mpg (from highest to lowest)

3.8 Lab 1: Solutions

x <- seq(2,10,by=2)
y <- c(3,5,7,11,13)

x+y

[1]  5  9 13 19 23

x-y

[1] -1 -1 -1 -3 -3

x*y

[1]   6  20  42  88 130

x/y

[1] 0.6666667 0.8000000 0.8571429 0.7272727 0.7692308

geometric_mean <- function(x) {
  J <- length(x)
  res <- prod(x)^(1/J)
  res
}

geometric_mean(c(10,8,13))

[1] 10.13159

first_date <- lubridate::mdy("01-01-1981")
second_date <- lubridate::mdy("01-10-2022")
second_date - first_date

Time difference of 14984 days

nrow(mtcars)

[1] 32

ncol(mtcars)

[1] 11

colnames(mtcars)

 [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
[11] "carb"

subset(mtcars, mpg >= 20)

                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

subset(mtcars, (mpg >= 20) & (hp >= 100))

                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

subset(mtcars, (cyl >= 6) | (hp >= 100))

                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

mtcars[10,]

          mpg cyl  disp  hp drat   wt qsec vs am gear carb
Merc 280 19.2   6 167.6 123 3.92 3.44 18.3  1  0    4    4

# without reversing the order, we would order from lowest to smallest
mtcars[rev(order(mtcars$mpg)),]

                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4

3.9 Coding Exercises

The stringr package contains a number of functions for working with strings. For this problem create the following character vector in R
```
x <- c("economics", "econometrics", "ECON 4750")
```
Install the stringr package and use the str_length function in the package in order to calculate the length (number of characters) in each element of x.
For this problem, we are going to write a function to calculate the sum of the numbers from 1 to $n$ where $n$ is some positive integer. There are actually a lot of different ways to do this.
- Approach 1: write a function called sum_one_to_n_1 that uses the R functions seq to create a list of numbers from 1 to $n$ and then the function sum to sum over that list.
- Approach 2: The sum of numbers from 1 to $n$ is equal to $n(n+1)/2$. Use this expression to write a function called sum_one_to_n_2 to calculate the sum from 1 to $n$.
- Approach 3: A more brute force approach is to create a list of numbers from 1 to $n$ (you can use seq here) and add them up using a for loop — basically, just keep track of what the current total is and add the next number to the total in each iteration of the for loop. Write a function called sum_one_to_n_3 that does this.
Hint: All of the functions should look like
```
sum_one_to_n <- function(n) {
  # do something
}
```
Try out all three approaches that you came up with above for $n=100$. What is the answer? Do you get the same answer using all three approaches?
The Fibonacci sequence is the sequence of numbers $0,1,1,2,3,5,8,13,21,34,55,\ldots$ that comes from starting with $0$ and $1$ and where each subsequent number is the sum of the previous two. For example, the 5 in the sequence comes from adding 2 and 3; the 55 in the sequence comes from adding 21 and 34.
1. Write a function called fibonacci that takes in a number n and computes the nth element in the Fibonacci sequence. For example fibonacci(5) should return 3 and fibonacci(8) should return 13.
2. Consider an alternative sequence where, starting with the third element, each element is computed as the sum of the previous two elements (the same as with the Fibonacci sequence) but where the first two elements can be arbitrary. Write a function alt_seq(a,b,n) where a is the first element in the sequence, b is the second element in the sequence, and n is which element in the sequence to return. For example, if $a=3$ and $b=7$, then the sequence would be $3,7,10,17,27,44,71,\ldots$ and alt_seq(a=3,b=7,n=4) = 17.
This problem involves writing functions related to computing prime numbers. Recall that a prime number is a positive integer whose only (integer) factors are 1 and itself (e.g., $6$ is not prime because it factors into $2\times 3$, but $5$ is a prime number because its only factors are $1$ and $5$).

For this problem, you cannot use any built-in functions in R for computing prime numbers or checking whether or not a number is a prime number. However, a helpful function for this problem is the modulo function, %% discussed earlier in the notes. Hint: Notice that 6 %% 2 = 0 indicates that 2 is a factor of 6; on the other hand, if you divide $5$ by any integer small than itself (except for $1$), the remainder will always be non-zero.
1. Write a function is_prime that takes x as an argument and returns TRUE if x is a prime number and returns FALSE if x is not a prime number.
2. Write a function prime that takes n as an argument and returns a vector of all the prime numbers from $1$ to $n$. If it is helpful, prime can call the function is_prime that you wrote for part (a).
Base R includes a data frame called iris. This is data about iris flowers (you can read the details by running ?iris).
1. How many observations are there in the entire data frame?
2. Calculate the average Sepal.Length across all observations in iris.
3. Calculate the average Sepal.Width among the setosa iris species.
4. Sort iris by Petal.Length and print the first 10 rows.
One of the examples that we gave above was about writing a function to solve quadratic equations, but, in the code presented above, we only returned one solution to the quadratic equation. Write a function quadratic_solver that takes in a, b, and c as arguments and returns both solutions to the quadratic equation in a list. For example, quadratic_solver(1,4,3) should return a list with two elements, -1 and -3.