class: center, middle, inverse, title-slide .title[ # The Simple Linear Regression Model ] .subtitle[ ## EC295 ] .author[ ### Justin Smith ] .institute[ ### Wilfrid Laurier University ] .date[ ### Fall 2022 ] --- # What is Econometrics? .center[ <figure> <img src="metrics.jpg" width="37%"> </figure> ] --- ## What is Econometrics - Defining characteristics of econometrics - Observational data - Use of regression analysis - Motivating statistical models with economic models - Focus on causality - This class introduces you to linear regression - Building block for many future economics classes - You will use this technique in EC481 --- ## Introduction to Linear Regression - Economic analysis often involves relating two or more variables - Does age of school entry affect test scores? - Does childhood health insurance affect adult health? - Does foreign competition affect domestic innovation? - These relationships are typically used for - <font color="red">**Causal Inference**</font>: the independent effect of one variable on another - <font color="red">**Prediction**</font>: estimating value of one variable given values of another - Which one you use depends on goals of your analysis - Causal inference is important in policy analysis - Prediction is useful for guessing unknown values of a variable - We will develop a model to use for these goals --- ## Context - A big issue in education is the size of school classes - Parents often in favour of smaller classes - More attention paid to individual students - Classes easier to control - Can do more interactive work - But, smaller classes are more expensive - More teaching resources per student - Important to measure benefit of smaller classes - Compare against cost to see if worthwhile - Book repeatedly discusses models in context of class size and student performance --- ## What Are We Trying to Model? - We want to relate test scores to class size - Hard to do this for specific individuals - Many reasons why test scores differ between people - Even people in same class sizes have very different scores - Instead focus on the <font color="red">systematic</font> relationship - We do this by focusing on average test scores - How do average test scores change with class size? - Several reasons to use the average - Highlights systematic patterns between variables - It is mathematically optimal way to predict a variable given another - Intuitively appealing --- ## What Are We Trying to Model? .pull-left[ - Mathematically we focus on the .red[**Conditional Expectation**] - In the context of test scores, the conditional expectation is $$ E[TestScore | STR] $$ - This is the average test score for each class size - `\(STR\)` is Student Teacher Ratio, a measure of class size ] .pull-right[ .content-box-red[ ### Reminder about Expected Values The **Expected Value** `\(E[Y]\)` of a random variable `\(Y\)` is its weighted average The **Conditional Expectation** `\(E[Y|X]\)` is the weighted average of a variable `\(Y\)` at specific values of another variable `\(X\)` ] ] --- ## What Are We Trying to Model? - Big problem: we do not know how average test scores relate to class size - Could be linear - Could be non-linear - Could some other weird function - Unfortunately, we will .red[.bolder[never know]] exactly how they relate 😠- Instead we approximate this relationship - In EC295 our we use linear models for the approximation - Often a good guess at true relationship - But unknown true model is probably more complicated --- ## The Linear Regression Model - A linear model relating test scores to each class size is `$$TestScore = \beta_{0} + \beta_{STR}STR + u$$` - Several important components of this model - `\(TestScore\)` are individual test scores - STR are individual class sizes - `\(\beta_{STR}\)` is the .red[slope] - Effect of one-unit change in class size on test scores - `\(\beta_{0}\)` is the .red[intercept] parameter - Test scores when class size is zero - `\(u\)` is everything except class size that determines test scores --- ## The Linear Regression Model .pull-left[ This model breaks test scores in to two pieces 1. .red[Population Regression Function] `$$\beta_{0} + \beta_{STR}STR$$` The predictable part of test scores 2. .red[Error Term] `$$u = TestScore - \beta_{0} - \beta_{STR}STR$$` The unobserved and unpredictable part of test scores ] .pull-right[ <img src="SLR_files/figure-html/prf-1.png" width="504" style="display: block; margin: auto;" /> ] --- ## The Linear Regression Model - Another big problem: We do not know the values of `\(\beta_{0}\)` and `\(\beta_{STR}\)` - They are parameters that we do not observe - We also do not observe `\(u\)` - The unobserved error term - Suppose we need to know these parameters - How do we proceed from here? - Answer: we .red[**estimate**] `\(\beta_{0}\)` and `\(\beta_{STR}\)` with a sample of data - There are several estimation methods - We will focus on .red[**Ordinary Least Squares (OLS)**] --- ## Drawing a Sample from the Population - To estimate our model, we need to collect data on test scores and class sizes - Imagine collecting a sample of size `\(n\)` - e.g. test scores and class sizes from 50 classes in different schools - `\(n = 50\)` in this case - The population regression model holds .red[for each member of the sample] `$$TestScore_{i} = \beta_{0} + \beta_{STR}STR_{i} + u_{i}$$` - The subscript `\(i\)` identifies a specific member of the sample - Test scores are assumed to be linearly related to class size for each member of the sample --- ## Ordinary Least Squares .content-box-blue[ **Ordinary Least Squares** A method that estimates regression parameters by choosing the ones that minimize the sum of the squared distance between the estimated regression line and each data point ] - To implement OLS, replace the unknowns of the population model with estimates `$$TestScore_{i} = \hat{\beta}_{0} + \hat{\beta}_{STR}STR_{i} + \hat{u}_{i}$$` - `\(\hat{\beta}_{0}\)` estimates `\(\beta_{0}\)` - `\(\hat{\beta}_{STR}\)` estimates `\(\beta_{STR}\)` - `\(\hat{u}_{i}\)` is the residual (estimates the error) - OLS .red[chooses] `\(\hat{\beta}_{0}\)` and `\(\hat{\beta}_{STR}\)` to minimize the sum of the squared residual --- ## Ordinary Least Squares - The sum of the squared residual is `$$\sum_{i=1}^{n} \hat{u}_{i}^{2} = \sum_{i=1}^{n} (TestScore_{i} - \hat{\beta}_{0} - \hat{\beta}_{1}STR_{i} )^{2}$$` - To solve, take derivative<sup>1</sup> above with respect to `\(\hat{\beta}_{0}\)` and `\(\hat{\beta}_{1}\)` and set to zero `$$\sum_{i=1}^{n} (TestScore_{i} - \hat{\beta}_{0} - \hat{\beta}_{STR}STR_{i}) = 0$$` `$$\sum_{i=1}^{n} (TestScore_{i} - \hat{\beta}_{0} - \hat{\beta}_{STR}STR_{i})STR_{i} = 0$$` - These are the .red[OLS Normal Equations] .footnotesize[ .footnote[ 1. If you don't know calculus, don't worry about it. I will not ask you to take a derivative in this class. ]] --- ## Ordinary Least Squares - Use these equations to solve for `\(\hat{\beta}_{0}\)` and `\(\hat{\beta}_{STR}\)` .content-box-blue[ **Ordinary Least Squares Estimators (for our example)** `$$\hat{\beta}_{0} = \overline{TestScore} - \hat{\beta}_{1}\overline{STR}$$` `$$\hat{\beta}_{STR} = \frac{\sum_{i=1}^{n}(STR_{i} - \overline{STR})(TestScore_{i} - \overline{TestScore})}{\sum_{i=1}^{n}(STR_{i} - \overline{STR})^2} = \frac{\widehat{cov}(STR_{i}, TestScore_{i})}{\widehat{var}(STR_{i})}$$` ] - The estimates of the intercept and slope based on our sample - 💥.red[**Important**]💥: these will differ from one sample to another - We will return to sampling variation later --- ## Ordinary Least Squares .pull-left[ The estimated model has its own terminology 1. .red[Sample Regression Function] `$$\hat{\beta}_{0} + \hat{\beta}_{STR}STR$$` The line constructed with the OLS estimators 2. .red[Predicted Value] `$$\widehat{TestScore}_{i} = \hat{\beta}_{0} + \hat{\beta}_{STR}STR_{i}$$` The value of `\(TestScore_{i}\)` implied by the sample regression function 3. .red[Residual] `$$\hat{u}_{i} = TestScore_{i} - \hat{\beta}_{0} - \hat{\beta}_{STR}STR_{i}$$` The difference between the actual value of `\(TestScore_{i}\)` and its prediction ] .pull-right[ <img src="SLR_files/figure-html/srf-1.png" width="504" style="display: block; margin: auto;" /> ] --- ## General Model - So far we have used a specific example - A population regression function for any outcome and any independent variable is `$$Y = \beta_{0} + \beta_{1}X + u$$` .pull-left[.content-box-blue[ **Ordinary Least Squares Estimators** `$$\hat{\beta}_{0} = \overline{Y} - \hat{\beta}_{1}\overline{X}$$` `$$\hat{\beta}_{1} = \frac{\sum_{i=1}^{n}(X_{i} - \overline{X})(Y_{i} - \overline{Y})}{\sum_{i=1}^{n}(X_{i} - \overline{X})^2} = \frac{\widehat{cov}(X_{i}, Y_{i})}{\widehat{var}(X_{i})}$$` ]] .pull-right[.content-box-blue[ **Sample Regression Function** `$$\hat{\beta}_{0} + \hat{\beta}_{1}X$$` **Predicted Value** `$$\widehat{Y}_{i} = \hat{\beta}_{0} + \hat{\beta}_{STR}X_{i}$$` **Residual** `$$\hat{u}_{i} = Y_{i} - \hat{\beta}_{0} - \hat{\beta}_{1}X_{i}$$` ]] --- ## Example: The Effect of Class Size on Test Scores .content-box-green[ - **Question:** .red[Are class size and student achievement related?] - We will create .red[simulated data] to explore the relationship - We set the process generating the data - Lets us control the true values of the parameters - We set these values to create realistic data - The simulated data will mimic actual data we see on test scores - We will use this dataset to explore linear regression - We will see mechanics of estimation - Also how sampling variation affects estimates ] --- ## Example: The Effect of Class Size on Test Scores .content-box-green[ - Suppose the population regression function is `$$TestScore_{i}= \beta_{0} + \beta_{1}STR_{i} + u_{i}$$` - `\(\beta_{1}\)` is effect of one more student per teacher - `\(\beta_{0}\)` is test score when class size is zero - Does not have a useful interpretation in this example - `\(u\)` are determinants of test scores other than student-teacher ratio - Natural ability - Student background - School/teacher quality - etc ] --- ## Example: Effect of Class Size on Test Scores .content-box-green[ .pull-left[ - Set the population regression equation as `$$TestScore= 700 - 2*STR + u$$` - Says that `\(\beta_{0} = 700\)`, `\(\beta_{1} = -2\)` - These are .red[fictional] population values - In reality we would never know these - We are pretending we know them for instructional reasons ] .pull-right[ <img src="SLR_files/figure-html/ex1-1.png" width="504" style="display: block; margin: auto;" /> ] ] --- ## Example: Effect of Class Size on Test Scores .content-box-green[ .pull-left[ - Next step is to estimate `\(\beta_{0}\)` and `\(\beta_{1}\)` - As though we did not know their values - First take sample of data from population - We will draw .red[420 observations] with a .red[simple random sample] - Stata code on right ] .pull-right[ **Stata Code** ```stata clear set obs 420 set seed 12345 gen str = rnormal(20,2) gen u = rnormal(0,20) gen testscr = 700 -2 * str + u ``` ] ] --- ## Example: Effect of Class Size on Test Scores .content-box-green[ - Before estimating parameters, summarize the data **Stata Code and Output** ```stata sum testscr str ``` ``` Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- testscr | 420 659.1345 20.67156 593.118 713.0748 str | 420 20.13071 2.103167 14.2861 27.30753 ``` - Note scale of test scores - Simulate scores from a standardized test - Standardized tests often scaled to have mean 650, standard deviation 20 - Roughly 20 students per teacher in these fictional districts ] --- ## Example: Effect of Class Size on Test Scores .content-box-green[ - Estimate intercept and slope by OLS **Stata Code and Output** ```stata regress testscr str ``` ``` Source | SS df MS Number of obs = 420 -------------+---------------------------------- F(1, 418) = 15.45 Model | 6383.10498 1 6383.10498 Prob > F = 0.0001 Residual | 172661.265 418 413.065226 R-squared = 0.0357 -------------+---------------------------------- Adj R-squared = 0.0333 Total | 179044.369 419 427.313531 Root MSE = 20.324 ------------------------------------------------------------------------------ testscr | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- str | -1.855817 .472094 -3.93 0.000 -2.783791 -.9278429 _cons | 696.4934 9.55519 72.89 0.000 677.7112 715.2756 ------------------------------------------------------------------------------ ``` ] --- ## Example: Effect of Class Size on Test Scores .content-box-green[ - The OLS estimates are `$$\hat{\beta}_{1} = -1.86$$` `$$\hat{\beta}_{0} = 696.49$$` - The sample regression function is `$$\widehat{TestScore}= 696.49 - 1.86STR$$` - Use to generate predictions of test scores - Simply plug in a value for `\(STR\)`, and compute `\(\widehat{TestScore}\)` ] --- ## Example: Effect of Class Size on Test Scores .content-box-green[ **Stata Code** <style> pre { white-space: pre-wrap; } .figure2 { margin-top: -3em; margin-bottom: -1.5em; margin-left: 0px; margin-right: 0px; } </style> ```stata predict fitted, xb twoway (scatter testscr str)(line fitted str), title(Test Scores and Student Teacher Ratio) subtitle(Fitted Values and Actual Data) ``` ``` ``` .center[.figure2[ <figure> <img src="scatter.svg" width="47%"> </figure> ] ] ] --- ## Example: Effect of Class Size on Test Scores .content-box-green[ **Stata Code** <style> pre { white-space: pre-wrap; } .figure2 { margin-top: -3em; margin-bottom: -1.5em; margin-left: 0px; margin-right: 0px; } </style> ```stata predict resid, residual twoway (scatter resid str), title(Test Scores and Student Teacher Ratio) subtitle(Residuals) ``` ``` ``` .center[.figure2[ <figure> <img src="resid.svg" width="47%"> </figure> ] ] ] --- # Measures of Fit ## Introduction - OLS is one way to estimate a linear regression model - It is important to know how well the method works - One way is to examine the .red[fit] of our regression line - How close to the line are the datapoints? - Does `\(X\)` explain a large fraction of variation in `\(Y\)`? - These are the .red[algebraic properties] of our estimator - Mathematical relationships hold true **in each sample** - Different from the .red[statistical properties] - The behaviour of estimators **across repeated samples** - Necessarily hypothetical because we only have one sample --- # Measures of Fit ## R-Squared - The .red[Coefficient of Determination] `\(R^2\)` measures the fraction of the variation in `\(y\)` that is explained by the independent variables `$$R^2 = \frac{ESS}{TSS}$$` - TSS is the .red[Total Sum of Squares] `$$TSS = \sum_{i=1}^{N} (Y_{i} - \bar{Y})^2$$` - A measure of the spread in the `\(Y_{i}\)` --- # Measures of Fit - ESS is the .red[Explained Sum of Squares] `$$ESS = \sum_{i=1}^{N} (\hat{Y}_{i} - \bar{Y})^2$$` - And the .red[Residual Sum of Squares (SSR)] is `$$SSR= \sum_{i=1}^{N} (\hat{u}_{i})^2$$` - `\(R^2\)` ranges between 0 and 1 - `\(R^2 = 0\)` means that `\(X\)` explains none of the variation in `\(Y\)` - Scatterplot between `\(Y\)` and `\(X\)` is a cloud with no obvious linear relationship - `\(R^2 = 1\)` means that `\(X\)` explains all of the variation in `\(Y\)` - Data in scatterplot between `\(Y\)` and `\(X\)` fall along a straight line --- # Measures of Fit - `\(R^2\)` is also equal to the square of correlation coefficient between `\(y_{i}\)` and `\(\hat{y}_{i}\)` - `\(R^2 = 1\)` is perfect correlation between prediction and actual values - An important relationship between sums of squares is `$$TSS= ESS + SSR$$` - Part of any movement of `\(y_{i}\)` away from its average is explainable by factors in the regression - Other part is related to unobserved factors - As a result, you can reexpress `$$R^2 = \frac{ESS}{TSS} = 1- \frac{SSR}{TSS}$$` --- # Measures of Fit - Important to be cautious when using `\(R^2\)` - In real applications, `\(R^2\)` is often very low - Does not mean regression is bad - Just means we have not captured all factors that explain `\(Y\)` - A low `\(R^2\)` does not imply a poor estimate of `\(\beta_{1}\)` - `\(\beta_{1}\)` measures effect on `\(Y\)` from changing `\(X\)`, all else equal - `\(R^2\)` measures fraction of total variation in `\(Y\)` is explained by `\(X\)` - Concepts are independent of each other - In class size example `\(R^2 = 0.036\)` - Many other factors besides student-teacher ratio explain test scores --- # Measures of Fit ## Standard Error of Regression (SER) - Can also measure fit with spread of data around regression line - The residual `\(\hat{u}_{i}\)` is deviation of `\(Y_{i}\)` from prediction `$$\hat{u}_{i} = Y_{i} - \hat{Y}_{i}$$` - The .red[standard error of regression (SER)] is the standard deviation of `\(\hat{u}_{i}\)` - The average distance of `\(Y_{i}\)` from its prediction `\(\hat{Y}_{i}\)` `$$SER = s_{\hat{u}} = \sqrt{\frac{1}{n-2}\sum_{i=1}^{n}\hat{u}_{i}^2} = \sqrt{\frac{SSR}{n-2} }$$` --- # Example .content-box-green[ - Recall the regression output from earlier ```stata regress testscr str ``` ``` Source | SS df MS Number of obs = 420 -------------+---------------------------------- F(1, 418) = 15.45 Model | 6383.10498 1 6383.10498 Prob > F = 0.0001 Residual | 172661.265 418 413.065226 R-squared = 0.0357 -------------+---------------------------------- Adj R-squared = 0.0333 Total | 179044.369 419 427.313531 Root MSE = 20.324 ------------------------------------------------------------------------------ testscr | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- str | -1.855817 .472094 -3.93 0.000 -2.783791 -.9278429 _cons | 696.4934 9.55519 72.89 0.000 677.7112 715.2756 ------------------------------------------------------------------------------ ``` ] --- # Example .content-box-green[ - The sums of squares are - `\(ESS = 6383.10\)` - `\(SSR = 172661.27\)` - `\(TSS = 179044.37\)` - `\(R^2 = 0.056\)` is in the top right corner - You can verify that - `\(SST = SSE + SSR\)` - `\(R^2 = \frac{SSE}{SST}\)` - The SER is called the .red[Root MSE (Mean Square Error)] in the output - From the output `\(SER = 20.32\)` ] --- # Least Squares Assumptions for Causal Inference - So far we have defined `\(\beta_{1}\)` only as the **slope** - The slope could be two things 1. The (standardized) .red[correlation] between `\(X\)` and `\(Y\)` - What happens to `\(Y\)` when we change `\(X\)`? 2. The .red[causal effect] of `\(X\)` on `\(Y\)` - What happens to `\(Y\)` when we change `\(X\)` and **nothing else that affects Y changes** - In many applications we want the causal effect - What happens to my income if I get a university degree? - How does getting a COVID shot affect the likelihood of infection? - In this section we establish what needs to be true for OLS to estimate a causal effect --- # Least Squares Assumptions for Causal Inference .content-box-red[ .pull-left[ **Correlation Example** - Regression of Income on Schooling with .red[observational data] `$$Inc = \beta_{0} + \beta_{1}Schl + u$$` - `\(\beta_{1}\)` shows how income changes with schooling - Probably represents only a correlation - People with more schooling were already smarter - Would have earned more even without schooling - Slope reflects partly effect of schooling, partly effect of intelligence ] .pull-right[ **Causation Example** - Regression of test scores on class size when students .red[randomly assigned to classes] `$$TestScore = \beta_{0} + \beta_{1}ClassSize + u$$` - `\(\beta_{1}\)` shows how bigger classes affect scores - Probably a causal effect because - Randomization of class size means it is unrelated to other factors - Students in big classes are no different from those in small ones - Slope reflects only independent effect of class size on scores ] ] --- # Least Squares Assumptions for Causal Inference - For OLS to estimate the .red[causal effect] the following things need to be true .content-box-blue[ **Assumptions for Causal Inference** The model relating `\(Y\)` to `\(X\)` is `$$Y = \beta_{0} + \beta_{1}X + u$$` where `\(\beta_{1}\)` is explicitly defined as the causal effect, **and**: 1. The error `\(u\)` is not systematically related to `\(X\)` on average: `$$E[u|X]=0$$` 2. `\((X_{i}, Y_{i})\)` are independent and identically distributed (iid) 3. Large outliers are unlikely ] --- # Least Squares Assumptions for Causal Inference ## Assumption 1: Zero Conditional Mean of the Error - The average error term `\(u_{i}\)`, conditional on `\(X_{i}\)`, is zero `$$E[u_{i}|X_{i}] = 0$$` - Means that unobserved factors are unrelated to the independent variable - No linear or non-linear relationship between the two - Zero correlation and covariance between `\(u_{i}\)` and `\(X_{i}\)` - Intuitively, at each `\(X_{i}\)` positive and negative errors tend to average out to zero - Assumption implies the population regression function accurately describes the conditional mean of `\(Y_{i}\)` - Average `\(Y_{i}\)` is linearly related to `\(X_{i}\)` --- # Least Squares Assumptions for Causal Inference - Why do we need to assume `\(E[u_{i}|X_{i}] = 0\)`? - It allows us to claim `\(\hat{\beta}_{1}\)` is .red[unbiased] - Average of `\(\hat{\beta}_{1}\)` over repeated samples equals `\(\beta_{1}\)` - When `\(\beta_{1}\)` is the causal effect and `\(\hat{\beta}_{1}\)` is an unbiased estimate of it, we can infer causality - `\(E[u_{i}|X_{i}] = 0\)` means no unobserved factors change systematically with `\(X_{i}\)` - When this is true, `\(\hat{\beta}_{1}\)` estimates the causal effect of `\(X_{i}\)` on `\(Y_{i}\)` - This is an **assumption** - We will never know for sure if it is true - Best we can do is assess whether we think it is reasonable - Most of the time, it is probably not (we will discuss later in the course) --- # Least Squares Assumptions for Causal Inference .pull-left[ **OLS Estimates Unbiased Causal Effect** <img src="SLR_files/figure-html/unnamed-chunk-8-1.png" width="504" style="display: block; margin: auto;" /> ] .pull-right[ **OLS Estimates Biased Effect** <img src="SLR_files/figure-html/unnamed-chunk-9-1.png" width="504" style="display: block; margin: auto;" /> ] --- # Least Squares Assumptions for Causal Inference ## Assumption 2: `\(\small{(X_{i},Y_{i})}\)` are iid - When sampling, we draw both `\(X_{i}\)` and `\(Y_{i}\)` for each person - Assumption is they are independent, and have the same distribution across people - If we have a simple random sample, this will be true - Observations come from same population - Chosen so that everyone has same chance of being in sample - Then one pair `\((X_{i},Y_{i})\)` gives no info about other `\((X_{i},Y_{i})\)` - Each `\((X_{i},Y_{i})\)` has same distribution - Assumption sometimes fails with different sampling schemes - Ex: time series and panel data --- # Least Squares Assumptions for Causal Inference ## Assumption 3: Large Outliers Unlikely .pull-left[ - .red[Outlier]: an observation on `\(X\)` or `\(Y\)` far outside usual range of data - OLS estimators are sensitive to outliers - Regression line on right is flat without outlier - Regression line tilts up significantly with one outlier ] .pull-right[ <img src="SLR_files/figure-html/unnamed-chunk-10-1.png" width="504" style="display: block; margin: auto;" /> ] --- # Least Squares Assumptions for Causal Inference - Outliers happen for several reasons - Data entry error - Recording height in cm instead of inches for 1 observation - Accidentally shifting decimal place - Entering a totally wrong value - Naturally occurring issues that are not errors - One large country in sample of small countries - One big donor in sample of charitable giving - Important to check data for outliers - Examine summary statistics before doing regression - E.g. mean, standard deviation, max, min, iqr, etc. --- # Sampling Distribution of OLS Estimators ## Introduction - The estimator `\(\hat{\beta}_{1}\)` is a quantity computed from a sample - Its value therefore varies from sample to sample - It is a .red[random variable] - The sampling distribution of `\(\hat{\beta}_{1}\)` describes the likelihood of values it can take across random samples - The sampling distribution helps us test claims about `\(\beta_{1}\)` through hypothesis tests - For hypothesis tests, we need to know the sampling distribution - In this section we derive it using our assumptions --- # Sampling Distribution of OLS Estimators ## The Mean of `\(\small{\hat{\beta}_{1}}\)` - Like all random variables, `\(\hat{\beta}_{1}\)` has a mean and variance - We compute these values as part of the description of the sampling distribution - To compute the mean, start with the formula for `\(\hat{\beta}_{1}\)` `$$\hat{\beta}_{1} = \frac{\sum_{i=1}^{n}(X_{i} - \bar{X})(Y_{i} - \bar{Y})}{\sum_{i=1}^{n}(X_{i} - \bar{X})^2}$$` - First step is to rearrange the formula - Rewrite numerator as `$$\sum_{i=1}^{n}(X_{i} - \bar{X})(Y_{i} - \bar{Y}) = \sum_{i=1}^{n}(X_{i} - \bar{X})(\beta_{1}(X_{i} - \bar{X}) + u_{i} - \bar{u}))$$` --- # Sampling Distribution of OLS Estimators - Multiplying out the brackets `$$= \sum_{i=1}^{n}(\beta_{1}(X_{i} - \bar{X})^2 + (X_{i} - \bar{X})(u_{i} - \bar{u}))$$` `$$= \beta_{1}\sum_{i=1}^{n}(X_{i} - \bar{X})^2 +\sum_{i=1}^{n} (X_{i} - \bar{X})(u_{i} - \bar{u})$$` - The last term can be simplified `$$\sum_{i=1}^{n} (X_{i} - \bar{X})(u_{i} - \bar{u}) = \sum_{i=1}^{n} (X_{i} - \bar{X})u_{i} - \sum_{i=1}^{n} (X_{i} - \bar{X})\bar{u}$$` `$$= \sum_{i=1}^{n} (X_{i} - \bar{X})u_{i}$$` --- # Sampling Distribution of OLS Estimators - Because `$$\bar{u}\sum_{i=1}^{n}(X_{i} - \bar{X}) =\bar{u}(\sum_{i=1}^{n}X_{i} - \sum_{i=1}^{n}\bar{X})$$` `$$=\bar{u}(\sum_{i=1}^{n}X_{i} - n\bar{x})=\bar{y}(\sum_{i=1}^{n}X_{i} - n\frac{1}{n}\sum_{i=1}^{n}X_{i})$$` `$$=\bar{u}(\sum_{i=1}^{n}X_{i} - \sum_{i=1}^{n}X_{i}) = 0$$` - Putting this all together `$$\hat{\beta}_{1} = \frac{\beta_{1}\sum_{i=1}^{n}(X_{i} - \bar{X})^2 +\sum_{i=1}^{n} (X_{i} - \bar{X})u_{i}}{\sum_{i=1}^{n}(X_{i} - \bar{X})^2}$$` `$$\hat{\beta}_{1} = \beta_{1}+\frac{\sum_{i=1}^{n} (X_{i} - \bar{X})u_{i}}{\sum_{i=1}^{n}(X_{i} - \bar{X})^2}$$` --- # Sampling Distribution of OLS Estimators - The estimator `\(\hat{\beta}_{1}\)` is the sum of two things - The parameter it is estimating - A weighted sum of the (unknown) errors - The expected value of `\(\hat{\beta}_{1}\)` is then `$$E[\hat{\beta}_{1}|X_{i}]= E \left [ \beta_{1} + \frac{\sum_{i=1}^{n}(X_{i} - \bar{X})u_{i}}{\sum_{i=1}^{n}(X_{i} - \bar{X})^2} | X_{i} \right ]$$` `$$= E[\beta_{1} |X_{i}] + E \left [ \frac{\sum_{i=1}^{n}(X_{i} - \bar{X})u_{i}}{\sum_{i=1}^{n}(X_{i} - \bar{X})^2} | X_{i} \right ]$$` `$$=\beta_{1} + \frac{\sum_{i=1}^{n}(X_{i} - \bar{X})E[u_{i}|X_{i}]}{\sum_{i=1}^{n}(X_{i} - \bar{X})^2}$$` --- # Sampling Distribution of OLS Estimators - Our first assumption is that `\(E[u_{i}|X_{i}]=0\)`, so `$$E[\hat{\beta}_{1}|X_{i}]=\beta_{1}$$` - For a given value of `\(X_{i}\)`, the average of `\(\hat{\beta}_{1}\)` is `\(\beta_{1}\)` - To find the **overall** average, use the law of iterated expectations `$$E[\hat{\beta}_{1}] = E[E[\hat{\beta}_{1}|X_{i}]]$$` --- # Sampling Distribution of OLS Estimators - Substituting in `\(E[\hat{\beta}_{1}|X_{i}]=\beta_{1}\)` `$$E[\hat{\beta}_{1}] = E[\beta_{1} ] = \beta_{1}$$` - Intuition: Since the average at each `\(X_{i}\)` is zero, the overall average is also zero -The resulting mean of the OLS estimator is .content-box-blue[ **Mean of the OLS Estimator** `$$E[\hat{\beta}_{1}] = \beta_{1}$$` ] --- # Sampling Distribution of OLS Estimators - `\(E[\hat{\beta}_{1}] = \beta_{1}\)` means that `\(\hat{\beta}_{1}\)` is .red[unbiased] - Why is this important? - **If we could repeatedly sample** the average of `\(\hat{\beta}_{1}\)` would be `\(\beta_{1}\)` - The only reason `\(\hat{\beta}_{1}\)` differs from `\(\beta_{1}\)` **in any one sample** is sampling error - A sample does not always match the population - Unbiased estimators are preferable to biased estimators - Biased estimators differ from parameter it is estimating because of sampling error **and** because it is systematically wrong - Statisticians will generally prefer an unbiased estimator - If `\(\beta_{1}\)` is the causal effect and `\(\hat{\beta}_{1}\)` is an unbiased estimate of it, we can attribute causality to the estimated relationship between `\(X_{i}\)` and `\(Y_{i}\)` --- # Sampling Distribution of OLS Estimators ## Variance of `\(\small{\hat{\beta}_{1}}\)` - The expected value tells us the middle of the distribution - We also need to know how spread out the values of `\(\hat{\beta}_{1}\)` are from the mean across samples - The key measure of this is the variance - Start with the alternate formula for `\(\hat{\beta}_{1}\)` we derived above `$$\hat{\beta}_{1}=\beta_{1} + \frac{\sum_{i=1}^{n}(X_{i} - \bar{X})u_{i}}{\sum_{i=1}^{n}(X_{i} - \bar{X})^2}$$` --- # Sampling Distribution of OLS Estimators - Rewrite the denominator using the sample variance of `\(X_{i}\)` `$$\hat{\beta}_{1}=\beta_{1} + \frac{\sum_{i=1}^{n}(X_{i} - \bar{X})u_{i}}{(n-1)s_{X}^2}$$` - where `\(s_{X}^2 = \frac{\sum_{i=1}^{n}(X_{i} - \bar{X})^2}{n-1}\)` - Multiply numerator and denominator by `\(\frac{1}{n}\)` `$$\hat{\beta}_{1}=\beta_{1} + \frac{\frac{1}{n}\sum_{i=1}^{n}(X_{i} - \bar{X})u_{i}}{(\frac{n-1}{n})s_{X}^2}$$` - From this point forward, we assume that we have a large sample - With large samples, estimators are very close to parameters - So `\(\bar{X} \approx \mu_{X}\)` and `\(s_{X}^2 \approx \sigma_{X}^2\)` - Also, `\(\frac{n-1}{n} \approx 1\)` --- # Sampling Distribution of OLS Estimators - Substitute these values into the formula `$$\hat{\beta}_{1}=\beta_{1} + \frac{\frac{1}{n}\sum_{i=1}^{n}(X_{i} - \mu_{X})u_{i}}{\sigma_{X}^2}$$` - Now use the variance operator `$$VAR(\hat{\beta}_{1})=VAR\left(\beta_{1} + \frac{\frac{1}{n}\sum_{i=1}^{n}(X_{i} - \mu_{X})u_{i}}{\sigma_{X}^2}\right)$$` - Since `\(\beta_{1}\)` is a fixed parameter, `$$VAR(\hat{\beta}_{1})=VAR\left( \frac{\frac{1}{n}\sum_{i=1}^{n}(X_{i} - \mu_{X})u_{i}}{\sigma_{X}^2}\right)$$` --- # Sampling Distribution of OLS Estimators - We will now make heavy use of the properties of variance - Because `\(\sigma_{X}^2\)` is a fixed constant `$$VAR(\hat{\beta}_{1})=\frac{1}{(\sigma_{X}^2)^2}VAR\left( \frac{1}{n}\sum_{i=1}^{n}(X_{i} - \mu_{X})u_{i}\right)$$` - Because `\(\frac{1}{n}\)` is a fixed constant `$$VAR(\hat{\beta}_{1})=\frac{1}{(\sigma_{X}^2)^2 n^2}VAR\left( \sum_{i=1}^{n}(X_{i} - \mu_{X})u_{i}\right)$$` - Finally, because `\(X_{i}\)` and `\(u_{i}\)` are unrelated `$$VAR(\hat{\beta}_{1})=\frac{n}{(\sigma_{X}^2)^2 n^2}VAR\left( (X_{i} - \mu_{X})u_{i}\right)$$` --- # Sampling Distribution of OLS Estimators - Simplifying, we have the final variance formula .content-box-blue[ **Variance of OLS Estimator** `$$VAR(\hat{\beta}_{1})=\frac{VAR\left( (X_{i} - \mu_{X})u_{i}\right) }{n(\sigma_{X}^2)^2}$$`] - Important things to note about the spread of `\(\hat{\beta}_{1}\)` - The larger is `\(n\)`, the smaller is the variance - More data reduces sampling variation - The larger is `\(\sigma_{X}^2\)`, the smaller is the variance - When `\(X_{i}\)` is more spread out, it is easier to estimate the linear relationship - A larger spread in `\(u_{i}\)` increases the variance - When `\(u_{i}\)` is more spread out, dots fall further from the estimated line - The slope becomes less precise --- # Sampling Distribution of OLS Estimators ## The Distribution of `\(\small{\hat{\beta}_{1}}\)` - We know the mean and variance of the distribution of `\(\hat{\beta}_{1}\)` - What about the shape? - If we assume a big sample we can apply the .red[Central Limit Theorem (CLT)] - The sum of independent random variables from the same population is approximately Normally distributed - `\(\hat{\beta}_{1}\)` is an average `$$\hat{\beta}_{1}=\beta_{1} + \frac{\frac{1}{n}\sum_{i=1}^{n}(X_{i} - \mu_{X})u_{i}}{\sigma_{X}^2}=\beta_{1} + \frac{\frac{1}{n}\sum_{i=1}^{n}v_{i} }{\sigma_{X}^2}$$` --- # Sampling Distribution of OLS Estimators - Central Limit Theorem says `\(\hat{\beta}_{1}\)` has a Normal distribution - We previously derived the mean and variance - This gives us the distribution of the OLS estimator .content-box-blue[ **Distribution of OLS Estimator** `$$\hat{\beta}_{1} \sim \mathcal{N}\left( \beta_{1},\frac{VAR\left( (X_{i} - \mu_{X})u_{i}\right) }{n(\sigma_{X}^2)^2} \right)$$` ] --- # Example .content-box-green[ .pull-left[ - Simulate the sampling distribution of `\(\hat{\beta}_{1}\)` - Code to the right: - Assumes model is `$$TestScore= 700 - 2*STR + u$$` - Draws 420 observation on `\(Y\)` and `\(X\)` - Computes `\(\hat{\beta}_{1}\)` based on sample - Repeats this 9999 times - Plots distribution of 9999 `\(\hat{\beta}_{1}\)` values ] .pull-right[ ```stata clear all local sims = 9999 set obs `sims' set more off gen beta1 = . forvalues x = 1/`sims' { preserve clear qui set obs 420 gen str = rnormal(20,2) gen u = rnormal(0,20) gen testscr = 700 -2 * str + u qui regress testscr str restore qui replace beta1 = _b[str] in `x' } ``` ] ] --- # Example .content-box-green[ ```stata twoway hist beta1, title(Sampling Distribution of Beta1) scheme(s2mono) ``` .center[ <figure> <img src="dist.svg" width="47%"> </figure> ] ] --- # Example .pull-left[ <img src="SLR_files/figure-html/unnamed-chunk-13-1.gif" style="display: block; margin: auto;" /> ] .pull-right[ <img src="SLR_files/figure-html/unnamed-chunk-14-1.png" width="504" style="display: block; margin: auto;" /> ]