To conclude this section, I want to briefly point you towards some advanced material. We will probably brush up against some of this material this semester. That being said, R has some very advanced capabilities related to data science, data manipulation, and data visualization. If you have time/interest you might push further in all of these directions. By the end of the semester, we may not have mastered these topics, but they should at least be accessible to you.
Related Reading: IDS Chapter 4 — strongly recommend that you read this
R has very good data cleaning / manipulating tools
Many of them are in the “tidyverse”
Mostly this semester, I’ll just give you a data set that is ready to be worked with. But as you move to your own research projects or do work for a company one day, you will realize that a major step in analyzing data is organizing (“cleaning”) the data in a way that you can analyze it
ggplot2– see below
dplyr— package to manipulate data
tidyr— more ways to manipulate data
readr— read in data
purrr— alternative versions of
tibble— alternative versions of
stringr— tools for working with strings
forcats— tools for working with factors
I won’t emphasize these too much as they are somewhat advanced topics, but if you are interested, these are good (and marketable) skills to have
Related Reading: IDS Ch. 6-11 —
R has very good data visualization tools; strongly recommend that you read this
Another very strong point of
Rcomes with the
plotcommand, but the
ggplot2package provides cutting edge plotting tools. These tools will be somewhat harder to learn, but we’ll use
ggplot2this semester as I think it is worth it.
538’s graphs produced with ggplot
Related Reading: IDS Ch. 40
Rmarkdown is a very useful way to mix code and content
These notes are written in Rmarkdown, and I usually write homework solutions in Rmarkdown
If you are interested, you can view the source for this book at http://github.com/bcallaway11/econ_4750_notes. The source code for this chapter is in the file
If you are interested, Github is a very useful version control tool (i.e., keeps track of the version of your project, useful for merging projects, and sharing or co-authoring code) and Dropbox (also useful for sharing code). I use both of these extensively — in general, I use Github relatively more for bigger projects and more public projects and Dropbox more for smaller projects and early versions of projects.
This is largely beyond the scope of the course, but, especially for students in ECON 6750, I recommend that you look up Latex. This is a markup language mainly for writing technical, academic writing. The big payoff is on writing mathematical equations. The equations in the Course notes are written in Latex.
An easy way to get started here is to use the website Overleaf. This is also closely related to markdown/R-markdown discussed above (Latex tends to be somewhat more complicate which comes with some associated advantages and disadvantages).