2.8 Advanced Topics

To conclude this section, I want to briefly point you towards some advanced material. We will probably brush up against some of this material this semester. That being said, R has some very advanced capabilities related to data science, data manipulation, and data visualization. If you have time/interest you might push further in all of these directions. By the end of the semester, we may not have mastered these topics, but they should at least be accessible to you.

2.8.1 Tidyverse

Related Reading: IDS Chapter 4 — strongly recommend that you read this

  • R has very good data cleaning / manipulating tools

    • Many of them are in the “tidyverse”

    • Mostly this semester, I’ll just give you a data set that is ready to be worked with. But as you move to your own research projects or do work for a company one day, you will realize that a major step in analyzing data is organizing (“cleaning”) the data in a way that you can analyze it

  • Main packages

    • ggplot2 – see below

    • dplyr — package to manipulate data

    • tidyr — more ways to manipulate data

    • readr — read in data

    • purrr — alternative versions of apply functions and for loops

    • tibble — alternative versions of data.frame

    • stringr — tools for working with strings

    • forcats — tools for working with factors

  • I won’t emphasize these too much as they are somewhat advanced topics, but if you are interested, these are good (and marketable) skills to have

2.8.2 Data Visualization

Related Reading: IDS Ch. 6-11 — R has very good data visualization tools; strongly recommend that you read this

  • Another very strong point of R

  • Base R comes with the plot command, but the ggplot2 package provides cutting edge plotting tools. These tools will be somewhat harder to learn, but we’ll use ggplot2 this semester as I think it is worth it.

  • 538’s graphs produced with ggplot

2.8.3 Reproducible Research

Related Reading: IDS Ch. 40

  • Rmarkdown is a very useful way to mix code and content

  • These notes are written in Rmarkdown, and I usually write homework solutions in Rmarkdown

  • If you are interested, you can view the source for this book at http://github.com/bcallaway11/econ_4750_notes. The source code for this chapter is in the file 01-statistical-programming.Rmd.

  • If you are interested, Github is a very useful version control tool (i.e., keeps track of the version of your project, useful for merging projects, and sharing or co-authoring code) and Dropbox (also useful for sharing code). I use both of these extensively — in general, I use Github relatively more for bigger projects and more public projects and Dropbox more for smaller projects and early versions of projects.

2.8.4 Technical Writing Tools

This is largely beyond the scope of the course, but, especially for students in ECON 6750, I recommend that you look up Latex. This is a markup language mainly for writing technical, academic writing. The big payoff is on writing mathematical equations. The equations in the Course notes are written in Latex.

An easy way to get started here is to use the website Overleaf. This is also closely related to markdown/R-markdown discussed above (Latex tends to be somewhat more complicate which comes with some associated advantages and disadvantages).