## 2.7 Programming basics

### 2.7.1 Writing functions

Related Reading: IDS 3.2

It is often helpful to write your own functions in R. If you ever find yourself repeating the same code over and over, this suggests that you should write this code as a function and repeatedly call the function.

Suppose we are interesting in solving the quadratic equation \[ ax^2 + bx + c = 0 \] If you remember the quadratic formula, the solution to this equation is \[ x = \frac{-b \pm \sqrt{b^2-4ac}}{2a} \] It would be tedious to calculate this by hand (especially if we wanted to calculate it for many different values of \(a\), \(b\), and \(c\)), so let’s write a function to do it.

```
<- function(a, b, c) {
quadratic_solver <- ( -b + sqrt(b^2 - 4*a*c) ) / 2*a
root1
root1 }
```

Before we try this out, let’s notice a few things. First, while this particular function is for solving the quadratic equation, this is quite representative of what a function looks like in R.

`quadratic_solver`

— This is the name of the function. It’s good to give your function a descriptive name related to what it does. But you could call it anything you want. If you wanted to call this function`uga`

, it would still work.the part

`<- function`

finishes off assigning the function the name`quadratic_solver`

and implies that we are writing down a function rather than a`vector`

or`data.frame`

or something else. This part will show up in all function definitions.the part

`(a, b, c)`

,`a`

,`b`

, and`c`

are the names of the*arguments*to the function. In a minute when we call the function, we need to tell the function the particular values of`a`

,`b`

, and`c`

for which to solve the quadratic equation. We could name these whatever we want, but, again, it is good to have descriptive names. When you write a different function, it can have as many arguments as you want it to have.the part

`{ ... }`

everything that the function does should go between the curly bracketsthe line

`root1 <- ( -b + sqrt(b^2 - 4*a*c) ) / 2*a`

contains the main thing that is calculated by our function. Notice that we only calculate one of the “roots” (i.e., solutions to the quadratic equation) because of the \(+\) in this expression.the line

`root1`

R returns whatever variable is on the last line of the function. It might be somewhat more clear to write`return(root1)`

. The behavior of the code would be exactly the same, but it is just the more common “style” in R programming to not include the explicit`return`

.

Now let’s try out our function

```
# solves quadratic equation for a=1, b=4, c=3
quadratic_solver(1,4,3)
#> [1] -1
# solves quadratic equation for a=-1, b=5, c=10
quadratic_solver(-1,5,10)
#> [1] -1.531129
```

Two last things that are worth pointing out about functions:

Functions in R can be set up to take default values for some of their arguments

Because the arguments have names, if you are explicit about the name of the argument, then the order of the argument does not matter.

To give examples, let’s write a slightly modified version of our function to solve quadratic equations.

```
<- function(a=1, b, c) {
quadratic_solver2 <- ( -b + sqrt(b^2 - 4*a*c) ) / 2*a
root1
root1 }
```

The only thing different here is that `a`

takes the default value of 1. Now let’s try some different calls to `quadratic_solver`

and `quadratic_solver2`

```
# solve again for a=1,b=4,c=3
quadratic_solver2(b=4,c=3)
#> [1] -1
# replace default and change order
quadratic_solver2(c=10,b=5,a=-1)
#> [1] -1.531129
# no default set for quadratic_solver so it will crash if a not provided
quadratic_solver(b=4,c=3)
#> Error in quadratic_solver(b = 4, c = 3): argument "a" is missing, with no default
```

### 2.7.2 if/else

Related Reading: IDS 3.1

Often when writing code, you will want to do different things depending on some condition. Let’s write a function that takes in the number of employees that are in a firm and prints “large” if the firm has more than 100 employees and “small” otherwise.

```
<- function(employees) {
large_or_small if (employees > 100) {
print("large")
else {
} print("small")
} }
```

I think, at this point, this code should make sense to you. The only new thing is the if/else. The following is not code that will actually run but is just to help understand the logic of if/else.

```
if (condition) {
# do something
else {
} # do something else
}
```

All that happens with if/else is that we check whether `condition`

evaluate to `TRUE`

or `FALSE`

. If it is `TRUE`

, the code will do whatever is inside the first set of brackets; if it is `FALSE`

, the code will do whatever is in the set of brackets following `else`

.

### 2.7.3 for loops

Related Reading: IDS 3.4

Often, we need to run the same code over and over again. A `for`

loop is a main programming tool for this case (`for`

loops show up in pretty much all programming languages).

We’ll have more realistic examples later on in the semester, but we’ll do something trivial for now.

```
<- c()
out for (i in 1:10) {
<- i*3
out[i]
}
out#> [1] 3 6 9 12 15 18 21 24 27 30
```

The above code, starts with \(i=1\), calculates \(i*3\) (which is 3), and then stores that result in the first element of the vector `out`

, then \(i\) increases to 2, the code calculates \(i*3\) (which is now 6), and stores this result in the second element of `out`

, and so on through \(i=10\).

### 2.7.4 Vectorization

Related Reading: IDS 3.5

Vectorizing functions is a relatively advanced topic in R programming, but it is an important one, so I am including it here.

Because we will often be working with data, we will often be performing the same operation on all of the observations in the data. For example, suppose that you wanted to take the logarithm of the number of employees for all the firms in `firm_data`

. One way to do this is to use a `for`

loop, but this code would be a bit of a mess. Instead, the function `log`

is **vectorized** — this means that if we apply it to a vector, it will calculate the logarithm of each element in the vector. Besides this, vectorized functions are often faster than `for`

loops.

Not all functions are vectorized though. Let’s go back to our function earlier called `large_or_small`

. This took in the number of employees at a firm and then printed “large” if the firm had more than 100 employees and “small” otherwise. Let’s see what happens if we call this function on a vector of employees (Ideally, we’d like the function to be applied to each element in the vector).

```
<- firm_data$employees
employees
employees#> [1] 531 6 15 211 25
large_or_small(employees)
#> Error in if (employees > 100) {: the condition has length > 1
```

This is not what we wanted to have happen. Instead of determining whether each firm was large or small, we get an error basically said that something may be going wrong here. What’s going on here is that the function `large_or_small`

is not vectorized.

In order to vectorize a function, we can use one of a number of “apply” functions in R. I’ll list them here

`sapply`

— this stands for “simplify” apply; it “applies” the function to all the elements in the vector or list that you pass in and then tries to “simplify” the result`lapply`

— stands for “list” apply; applies a function to all elements in a vector or list and then returns a list`vapply`

— stands for “vector” apply; applies a function to all elements in a vector or list and then returns a vector`apply`

— applies a function to either the rows or columns of a matrix-like object (i.e., a matrix or a data frame) depending on the value of the argument`MARGIN`

Let’s use `sapply`

to vectorize `large_or_small`

.

```
<- function(employees_vec) {
large_or_small_vectorized sapply(employees_vec, FUN = large_or_small)
}
```

All that this will do is call the function `large_or_small`

for each element in the vector `employees`

. Let’s see it in action

```
large_or_small_vectorized(employees)
#> [1] "large"
#> [1] "small"
#> [1] "small"
#> [1] "large"
#> [1] "small"
#> [1] "large" "small" "small" "large" "small"
```

This is what we were hoping for.

Side Comment: A relatively popular alternative to `apply`

functions are `map`

functions provided in the `purrr`

package.

Side Comment: It’s often helpful to have a vectorized version of if/else. In `R`

, this is available in the function `ifelse`

. Here is an alternative way to vectorize the function `large_or_small`

:

```
<- function(employees_vec) {
large_or_small_vectorized2 ifelse(employees_vec > 100, "large", "small")
}large_or_small_vectorized2(firm_data$employees)
#> [1] "large" "small" "small" "large" "small"
```

Here you can see that `ifelse`

makes every comparison in its first argument, and then returns the second element for every `TRUE`

coming from the first argument, and returns the third element for every `FALSE`

coming from the first argument.

`ifelse`

also works with vectors in the second and third element. For example:

```
ifelse(c(1,3,5) < 4, yes=c(1,2,3), no=c(4,5,6))
#> [1] 1 2 6
```

which picks up 1 and 2 from the second (`yes`

) argument and 6 from the third (`no`

) argument.

Side Comment: I also typically replace most all

`for`

loops with an`apply`

function. In most cases, I don’t think there is much of a performance gain, but the code seems easier to read (or at least more concise).Earlier we wrote a function to take a vector of numbers from 1 to 10 and multiply all of them by 3. Here’s how you could do this using

`sapply`

which is considerably shorter.

One last thing worth pointing out though is that multiplication is already vectorized, so you don’t actually need to do

`sapply`

or the`for`

loop; a better way is just