2.8 Advanced Topics
To conclude this section, I want to briefly point you towards some advanced material. We will probably brush up against some of this material this semester. That being said, R has some very advanced capabilities related to data science, data manipulation, and data visualization. If you have time/interest you might push further in all of these directions. By the end of the semester, we may not have mastered these topics, but they should at least be accessible to you.
2.8.1 Tidyverse
Related Reading: IDS Chapter 4 — strongly recommend that you read this
R has very good data cleaning / manipulating tools
Many of them are in the “tidyverse”
Mostly this semester, I’ll just give you a data set that is ready to be worked with. But as you move to your own research projects or do work for a company one day, you will realize that a major step in analyzing data is organizing (“cleaning”) the data in a way that you can analyze it
Main packages
ggplot2
– see belowdplyr
— package to manipulate datatidyr
— more ways to manipulate datareadr
— read in datapurrr
— alternative versions ofapply
functions andfor
loopstibble
— alternative versions ofdata.frame
stringr
— tools for working with stringsforcats
— tools for working with factors
I won’t emphasize these too much as they are somewhat advanced topics, but if you are interested, these are good (and marketable) skills to have
2.8.2 Data Visualization
Related Reading: IDS Ch. 6-11 — R
has very good data visualization tools; strongly recommend that you read this
Another very strong point of
R
Base
R
comes with theplot
command, but theggplot2
package provides cutting edge plotting tools. These tools will be somewhat harder to learn, but we’ll useggplot2
this semester as I think it is worth it.538’s graphs produced with ggplot
2.8.3 Reproducible Research
Related Reading: IDS Ch. 40
Rmarkdown is a very useful way to mix code and content
These notes are written in Rmarkdown, and I usually write homework solutions in Rmarkdown
If you are interested, you can view the source for this book at http://github.com/bcallaway11/econ_4750_notes. The source code for this chapter is in the file
01-statistical-programming.Rmd
.If you are interested, Github is a very useful version control tool (i.e., keeps track of the version of your project, useful for merging projects, and sharing or co-authoring code) and Dropbox (also useful for sharing code). I use both of these extensively — in general, I use Github relatively more for bigger projects and more public projects and Dropbox more for smaller projects and early versions of projects.
2.8.4 Technical Writing Tools
This is largely beyond the scope of the course, but, especially for students in ECON 6750, I recommend that you look up Latex. This is a markup language mainly for writing technical, academic writing. The big payoff is on writing mathematical equations. The equations in the Course notes are written in Latex.
An easy way to get started here is to use the website Overleaf. This is also closely related to markdown/R-markdown discussed above (Latex tends to be somewhat more complicate which comes with some associated advantages and disadvantages).