class: center, middle, inverse, title-slide .title[ # Introduction to ECON 8080 ] .author[ ### Brantly Callaway ] .date[ ### University of Georgia ] --- # Main Topics for this Semester `$$\newcommand{\E}{\mathbb{E}}$$` <style type="text/css"> .alert { font-weight:bold; color: red; } .alert-blue { font-weight: bold; color: blue; } .remark-slide-content { font-size: 23px; padding: 1em 4em 1em 4em; } .highlight-red { background-color:red; padding:0.1em 0.2em; } </style> The main two topics that we'll cover this semester are: * Linear Regression * Panel Data -- Throughout the semester, we'll also mostly be interested on learning tools/concepts that are useful for conducting empirical research in economics (and, more broadly, social sciences and business disciplines) -- * In practice, a whole lot of research questions are like: <span class="alert">how did some policy/intervention/etc. cause some outcomes of interest to change relative to what they would have been in the absence of the policy/intervention?</span> -- * We will have these sorts of questions in mind (or at least in the back of our minds) throughout the semester --- # Introduction The course is also relatively <span class="alert">theoretical:</span> -- * One version of this class might primarily teach you which buttons to click in some statistical programming language and under which situations -- * In this class, we'll go into substantial detail on topics and try to <span class="alert">figure out how things work</span> - For example, for all the estimators that we talk about this semester, we will write the code to implement them ourselves rather than relying on "canned" implementations from `R`. - We'll also spend a lot of time on theory related to <span class="alert">conducting inference</span> — this can be mathematically intense --- # Statistical Programming This semester, we'll do a lot of statistical programming, using the `R` language -- - Please see the syllabus for additional resources on `R` programming -- - Our TA, Hugo, is a very good R programmer, and he is a very good resource this semester -- - I expect the difficulty level on this front to be relatively high --- # Other Resources On the syllabus, there are a number of additional resources. These are mostly at the Ph.D. or M.A. level. If you need more introductory material on any topics here are some suggestions: -- - I have very detailed course notes for my undergraduate class (both on `R` and on Econometrics) <a href="https://bcallaway11.github.io/Courses/ECON_4750_Fall_2022">https://bcallaway11.github.io/Courses/ECON_4750_Fall_2022</a> -- - A relatively short introduction to `R` from Stephanie Spielman: <a href="https://sjspielman.github.io/datascience_for_biologists/tutorials/introduction_to_R.html">https://sjspielman.github.io/datascience_for_biologists/tutorials/introduction_to_R.html</a> -- - I also like both <a href="https://www.amazon.com/Introduction-Econometrics-4th-Pearson-Economics/dp/0134461991/ref=sr_1_1?keywords=stock+and+watson+introduction+to+econometrics&qid=1641821555&sprefix=stock+and+wat%2Caps%2C157&sr=8-1">Stock and Watson</a> and <a href="https://www.amazon.com/Introductory-Econometrics-Modern-Approach-MindTap/dp/1337558869/ref=sr_1_1?keywords=wooldridge+econometrics&qid=1641821581&sprefix=wooldridge+%2Caps%2C86&sr=8-1">Wooldridge</a> as undergraduate level books. - They both cover a lot of the same material that we will cover in this course, though at an easier level. - [If you get either of these, you can get an older edition and save some money.] --- # Getting Started ECON 8070 ended with an introduction to linear regression. Linear regression will be our first main topic for this semester. -- - There may be some overlap at the beginning of this semester, but that is by design. -- - That said, we have been totally reworking our Ph.D. sequence in Econometrics last year and this year. I'm open to feedback during the semester about the difficulty level, pace, and (to some extent) topics. --- # Getting Started: Some Conceptual Issues Linear projection vs. Linear CEF vs. Causal Inference vs. Structural Models -- **Linear Projection** Given that we have data about `\(Y\)` and `\(X\)`, we can always "run a regression" of `\(Y\)` on `\(X\)`. $$ \beta = \underset{b}{\textrm{argmin}} \ \ \E[(Y-X'b)^2] $$ -- which has the solution $$ \beta = \E[XX']^{-1}\E[XY] $$ -- and can be estimated by $$ `\begin{aligned} \hat{\beta} = \left(\frac{1}{n} \sum_{i=1}^n X_i X_i'\right) \frac{1}{n} \sum_{i=1}^n X_i Y_i \end{aligned}` $$ --- # Linear Conditional Expectation Function Very often, we will be interested in estimating/learning about <span class="alert">conditional expectations: </span> $$ \E[Y|X] $$ You can potentially try to estimate this conditional expectations without making strong functional form assumptions (i.e., nonparametric econometrics), though this comes with a number of practical difficulties -- However, if you (somehow) know $$ \E[Y|X] = X'\beta $$ this will greatly simplify estimation. - and typically `\(\beta\)` would be estimated in exactly the same way as in the linear projection model from the previous slide --- # Linear Projection/CEF **Importantly** Without further assumptions/conditions both the linear projection model and the Linear CEF model are just <span class="alert">descriptive</span> -- For example, suppose that we know that `\(\E[Y|X] = X'\beta\)`. - If a new observation shows up with characteristics `\(x\)`, then `\(x'\beta\)` is likely to be the "best" prediction that we can make about what `\(Y\)` will be equal to for this observation -- - Similarly, we will frequently be interested in **regression derivatives** (aka partial effects) such as: $$ \frac{\partial \, \E[Y|X=x]}{\partial \, x_1} \underset{\textrm{Linear CEF}}{=} \beta_1 $$ But this could be quite different from how much <span style="font-style: italic;">outcomes would change on average if, say, a policymaker were able to manipulate `\(x_1\)` for particular units</span> --- # Causality Many researchers are not inherently interested in the partial effects on the previous slide (though in some cases or under some conditions they could coincide...) -- - Instead, they are interested in causal questions like: Among those that were affected by a policy/intervention, what is the difference between their actual outcomes and *the outcomes they would have experienced if they had participated in the policy/intervention?* -- Linear regressions are probably the most common estimation procedure used to think about the causal effect of some policy or intervention on outcomes of interest -- - This can typically be rationalized mainly under the conditions that - The researcher is able to control for "enough" other variables, - The model is correctly specified - The effect of the policy is the same across all units --- # Causality (cont'd) **Notation:** For most of our theoretical arguments this semester, we will consider the case where `\(X\)` is a `\(k \times 1\)` vector and `\(\beta\)` is a `\(k \times 1\)` vector. That said, much research in economics is interested in the effect of a particular element, say, `\(X_1\)`. -- In this case, it is sometimes helpful to write $$ \E[Y|X] = X_1 \beta_1 + X_2'\beta_2 $$ -- Under the conditions on the previous slide, `\(\beta_1\)` could be interpreted as the causal effect of `\(X_1\)` on `\(Y\)`. --- # Structural Models Sometimes the regression in the previous slide may be interpreted as a "structural model" -- 1. This typically means that it is derived from some economic theory -- 2. The parameters of the model are invariant to different policies and/or economic settings. * This implies that, say, `\(\beta_1\)` can be interpreted as the causal effect of `\(X_1\)` * More than that though, this idea is closely related to whether or not it is reasonable to <span class="alert">extrapolate</span> from one set of circumstances to another. * Example: Minimum Wage <!--For example, suppose you are interested in the effect of the minimum wage on employment and have data about lots of small (e.g., $1 `\(\uparrow\)`). Calling observed changes in employment following the minimum wage change as the causal effect (obviously) requires much weaker assumptions relative to extrapolating these results to the case with a substantially larger minimum wage increase.--> -- These sorts of interpretations are often the most useful, but also require the strongest assumptions. --- # Structural Models (cont'd) **Side-comment:** Sometimes structural models are contrasted with <span class="alert">reduced form</span> approaches to causality (which is different from how I talked about causality before). - I think that reduced form is typically taken as meaning "not using a model." - This distinction probably makes sense in cases where a researcher has access to an experiment (i.e., can randomly assign units to be affected by the intervention) -- That said, economics has a rich history of writing down economic models, for example: - Production functions in IO/Macro - Wage dynamics in labor economics - (and other fields) pandemic models in epidemiology --- # Structural Models (cont'd) I (strongly) think that models are useful for answering causal questions. So, in my view, it is often more helpful to classify: -- - **Causal approaches** as being "local" - That is, trying to answer: "What <span class="alert">was</span> the causal effect of some observed intervention?") -- - **Structural approaches** as allowing for extrapolation - That is, being able to additionally answer "what <span class="alert">would happen</span> under some other policy...") -- In this course, we'll mostly have in mind the goal of causal inference though (as you will notice from the above discussion), there is a great deal of overlap in terms of the types of tools used for different types of analysis --- # Panel Data The second main topic for the semester is panel data -- - Panel data is data where we we can follow the same unit (this could be an indivdual, firm, etc.) over time -- - We will study traditional panel data models -- - And additionally modern approaches to using panel data to think about causality - This my main research area and one that I think is very interesting and useful. - This is an extremely popular approach in empirical work in economics -- - <span class="alert">My super-high-level intuition:</span> If you want to understand the causal effect of some policy/intervention, it is very useful to have information about a person's outcomes before they were affected by the intervention as well as to have information about how outcomes evolved for other (hopefully "similar") people who did not participate --- # Running Examples this Semester - Effect of attending college (or number of years of education) on a person's earnings - Effect of mother smoking on infant's birthweight - Effect of parents' income on child's income (intergenerational mobility) - Effect of mutual fund expense ratio on investment returns -- Throughout the semester, we'll use some data that comes from the textbook as well as some data from external sources. [I'll post at the relevant time.] --- # Brief Outline of the First Few Weeks - We will start with a discussion of some classical motivations for running regressions - Next, we will consider why you might want to use a linear regression to think about causal effects (along with some possible weaknesses) -- - Then, we will spend several weeks learning about how regressions work, how to conduct inference, etc. We will cover this in much detail. - This will be useful more generally than for just thinking about causal effects - We will also cover the "theory" of linear regressions extensively, both for its own sake and because a lot of those arguments will be applicable to other (and more complicated) estimators. --- # Last comment before we get going... <span class="alert">My super-high-level advice about research:</span> -- It may be easy to get lost in the math/programming this semester (it's true that you will face major challenges there), but the following idea is one that always sticks with me: -- - If you want to think about the causal effect of a policy/intervention, at the end of the day, you are going to be comparing one set of outcomes to another set of outcomes - The key question is: Do <span class="alert">you believe</span> that that this comparison can be interpreted as a causal effect?