4 Random Variables
This section contains our crash course review of topics in probability. The discussion mostly follows Chapter 2 in the Stock and Watson textbook, and I have cross-listed the relevant sections in the textbook here.
At a very high level, probability is the set of mathematical tools that allow us to think about random events.
Just to be clear, random means uncertain, not 50:50.
A simple example of a random event is the outcome from rolling a die.
Eventually, we will treat data as being random draws from some population. Examples of things that we will treat as random draws are things like a person’s hair color, height, income, etc. We will think of all of these as being random draws because ex ante we don’t know what they will be.
4.1 Data for this chapter
For this chapter, we’ll use data from the U.S. Census Bureau from 2019. It is not quite a full census, but we’ll treat it as the population throughout this chapter.
4.2 Random Variables
SW 2.1
A random variable is a numerical summary of some random event.
Some examples:
Outcome of roll of a die
A person’s height in inches
A firm’s profits in a particular year
Creating a random variable sometime involves “coding” non-numeric outcomes, e.g., setting
hair=1
if a person’s hair color is black,hair=2
if a person’s hair is blonde, etc.
We’ll generally classify random variables into one of two categories
Discrete — A random variable that takes on discrete values such as 0, 1, 2
Continuous — Takes on a continuum of values
These are broad categories because a lot of random variables in economics sit in between these two.
4.3 pdfs, pmfs, and cdfs
SW 2.1
The distribution of a random variable describes how likely it is take on certain values.
A random variable’s distribution is fully summarized by its:
probability mass function (pmf) if the random variable is discrete
probability density function (pdf) if the random variable is continuous
The pmf is somewhat easier to explain, so let’s start there. For some discrete random variable
Example: Suppose that
Example: Let’s do a bit more realistic example where we look at the pmf of education in the U.S. Suppose that
There are some things that are perhaps worth pointing out here. The most common amount of education in the U.S. appears to be exactly 12 years — corresponding to graduating from high school; about 32% of the population has that level of education. The next most common number of years of education is 16 — corresponding to graduating from college; about 24% of individuals have this level of education. Other relatively common values of education are 13 years (14% of individuals) and 18 (13% of individuals). About 1% of individuals report 0 years of education. It’s not clear to me whether or not that is actually true or reflects some individuals mis-reporting their education.
Before going back to the pdf, let me describe another way to fully summarize the distribution of a random variable.
- Cumulative distribution function (cdf) - The cdf of some random variable
is defined as
Example: Suppose
Example: Let’s go back to our example of years of education in the U.S. In this case,
You can see that the cdf is increasing in the years of education. And there are big “jumps” in the cdf at values of years of education that are common such as 12 and 16.
We’ll go over some properties of pmfs and cdfs momentarily (perhaps you can already deduce some of them from the above figures), but before we do that, we need to go over some (perhaps new) tools.
4.4 Summation operator
It will be convenient for us to have a notation that allows us to add up many numbers/variables at the same time. To do this, we’ll introduce the
As a simple example, suppose that we have three variables (it doesn’t matter if they are random or not):
For any constant
,[This is just the definition of multiplication]
For any constant c,
In words: constants can be moved out of the summation.
We will use the property often throughout the semester.
As an example,
where the first line is just the definition of the summation, the second equality factors out the 7, and the last equality writes the part about adding up the
’s using summation notation.
4.5 Properties of pmfs and cdfs
Let’s define the support of a random variable
Example: Suppose
Example: Suppose
Properties of pmfs
For any
,In words: the probability of
taking some particular value can’t be less than 0 or greater than 1 (neither of those would make any sense)In words: if you add up
across all possible values that could take, they sum to 1.
Properties of cdfs for discrete random variables
For any
,In words: the probability that
is less than or equal to some particular value has to be between 0 and 1.If
, thenIn words: the cdf is increasing in
(e.g., it will always be the case that ). andIn words: if you choose small enough values of
, the probability that will be less than that is 0; similar (but opposite) logic applies for big values of .
Connection between pmfs and cdfs
In words: you can “recover” the cdf from the pmf by adding up the pmf across all possible values that the random variable could take that are less than or equal to
. This will be clearer with an example:
Example: Suppose that
4.6 Continuous Random Variables
SW 2.1
For continuous random variables, you can define the cdf in exactly the same way as we did for discrete random variables. That is, if
Example: Suppose
From the figure, we can see that about 24% of working individuals in the U.S. each $20,000 or less per year, 61% of working individuals earn $50,000 or less, and 88% earn $100,000 or less.
It’s trickier to define an analogue to the pmf for a continuous random variable (in fact, this is the main reason for our separate treatment of discrete and continuous random variables). For example, suppose
Regions where the pdf is larger correspond to more likely values of
We can also write the cdf as an integral over the pdf. That is,
More properties of cdfs
In words, if you want to calculate the probability that
is greater than some particular value , you can do that by calculating .In words: you can also calculate the probability that
falls in some range using the cdf.
Example: Suppose
From the figure, we can see that the most common values of yearly income are around $25-30,000 per year. Notice that this corresponds to the steepest part of the cdf from the previous figure. The right tail of the distribution is also long. This means that, while incomes of $150,000+ are not common, there are some individuals who have incomes that high.
Moreover, we can use the properties of pdfs/cdfs above to calculate some specific probabilities. In particular, we can calculating probabilities by calculating integrals (i.e., regions under the curve) / relating the pdf to the cdf. First, the red region above corresponds to the probability of a person’s income being between $50,000 and $100,000. This is given by R
using the ecdf
function. In particular,
<- ecdf(us_data$incwage)
incwage_cdf round(incwage_cdf(100000) - incwage_cdf(50000),3)
[1] 0.27
The green region in the figure is the probability of a person’s income being above $150,000. Using the above properties of cdfs, we can calculate it as
round(1-incwage_cdf(150000), 3)
[1] 0.052
4.7 Multiple Random Variables
SW 2.3
Most often in economics, we want to consider two (or more) random variables jointly rather than just a single random variable. For example, mean income is interesting, but mean income as a function of education is more interesting.
When there is more than one random variable, you can define joint pmfs, joint pdfs, and joint cdfs.
Let’s quickly go over these for the case where
Joint pmf:
Joint cdf:
Conditional pmf:
Properties
We use the notation that
for allIn words: the probability of
and taking any particular values can’t be less than 0 or greater than 1 (because these are probabilities)In words: If you add up
across all possible values of and , they sum up to 1 (again, this is just a property of probabilities)If you know the joint pmf, then you can recover the marginal pmf, that is,
This amounts to just adding up the joint pmf across all values of
while holding fixed. A main takeaway from this property is the following: if you know the joint pmf of two random variables, then it implies that you know the pmf of each random variable individuals. Thus, if you know the joint pmf, it implies that you know more than if you only knew the marginal pmfs.
Example: Suppose that you roll a die, and based on this roll, you create the following random variables.
Let’s consider what values
roll | X | Y |
---|---|---|
1 | 0 | 0 |
2 | 0 | 1 |
3 | 0 | 0 |
4 | 1 | 1 |
5 | 1 | 0 |
6 | 1 | 1 |
Thus,