7 Asymptotic Properties
7.1 Large Sample Properties of Estimators
SW 2.6
Statistics/Econometrics often relies on “large sample” (meaning: the number of observations,
Intuition: We generally expect that estimators that use a large number of observations will perform better than in the case with only a few observations.
The second goal of this section will be to introduce an approach to conduct hypothesis testing. In particular, we may have some theory and want a way to test whether or not the data that we have “is consistent with” the theory or not. These arguments typically involve either making strong assumptions or having a large sample — we’ll mainly study the large sample case as I think this is more useful.
7.2 Consistency
An estimator
The main tool for studying consistency is the law of large numbers. The law of large numbers says that sample averages converge to population averages as the sample size gets large. In math, this is
Example: Let’s consider the same three estimators as before and whether or not they are consistent. First, the LLN implies that
which implies that it is consistent. It is interesting to note that
7.3 Asymptotic Normality
The next large sample property that we’ll talk about is asymptotic normality. This is a hard one to wrap your mind around, but I’ll try to explain as clearly as possible. We’ll start by talking about what it is, and then we’ll move to why it’s useful.
Most of the estimators that we will talk about this semester have the following property
An equivalent, alternative expression that is sometimes useful is
To establish asymptotic normality of a particular estimator, the main tool is the central limit theorem. The central limit theorem (sometimes abbreviated CLT) says that
In words, the CLT says that if you take the difference between
There are a few things to point out:
Just to start with, this is not nearly as “natural” a result as the LLN. The LLN basically makes perfect sense. For me, I know how to prove the CLT (though we are not going to do it in class), but I don’t think that I would have ever been able to come up with this on my own.
Notice that the CLT does not rely on any distributional assumptions. We do not need to assume that
follows a normal distribution and it will apply when follows any distribution (up to some relatively minor technical conditions that we will not worry about).It is also quite remarkable. We usually have the sense that as the sample size gets large that things will converge to something (e.g., LLN saying that sample averages converge to population averages) or that they will diverge (i.e., go off to positive or negative infinity themselves). The CLT provides an intermediate case —
is neither converging to a particular value or diverging to infinity. Instead, it is converging in distribution — meaning: it is settling down to something that looks like a draw from some distribution rather than converging to a particular number.In some sense, you can think of this “convergence in distribution” as a “tie” between the part
which, by itself, is converging to 0, and which, by itself, is diverging to infinity. In particular, notice that where this argument just holds by the properties of variance that we have used many times before. This means that the variance of does not go to 0 (which would suggest that the whole term converges to 0) nor does it go to (which would suggest that the term diverges). Moreover, if you multiplied instead by something somewhat smaller, say, , then the term would “win” and the whole expression would converge to 0 (to see this, try calculating ). On the other hand, if you multiplied by something somewhat larger, say, , then the part would “win” and the whole thing would diverge (to see this, try calculating ). turns out to be “just right” so that there is essentially a “tie” and this term neither converges to a particular number nor diverges.A very common question for students is: “how large does
need to be for the central limit theorem to apply?” Unfortunately, there is a not a great answer to this (though some textbooks have sometimes given explicit numbers here). Here is a basic explanation for why it is hard to give a definite number. Suppose follows a normal distribution, then it will not take many observations for the normal approximation to hold. On the other hand, if were to come from a discrete distribution or just a generally complicated distribution, then it might take many more observations for the normal approximation to hold.
All that to say, I know that the CLT is hard to understand, but the flip-side of that is that it really is a fascinating result. We’ll see how its useful next.