The Simple Linear Regression Model

class: center, middle, inverse, title-slide

.title[
# The Simple Linear Regression Model
]
.subtitle[
## EC295
]
.author[
### Justin Smith
]
.institute[
### Wilfrid Laurier University
]
.date[
### Fall 2022
]

---

# What is Econometrics?

.center[
<figure>
  <img src="metrics.jpg"  width="37%">
</figure>
]
---

## What is Econometrics

- Defining characteristics of econometrics

- Observational data
  
  - Use of regression analysis
  
- Motivating statistical models with economic models
  
  - Focus on causality
  
- This class introduces you to linear regression

- Building block for many future economics classes
  
  - You will use this technique in EC481

---

## Introduction to Linear Regression

- Economic analysis often involves relating two or more variables

- Does age of school entry affect test scores?
  
  - Does childhood health insurance affect adult health?
  
  - Does foreign competition affect domestic innovation?
  
- These relationships are typically used for

- <font color="red">**Causal Inference**</font>: the independent effect of one variable on another
  
  - <font color="red">**Prediction**</font>: estimating value of one variable given values of another

- Which one you use depends on goals of your analysis
  
  - Causal inference is important in policy analysis
  
  - Prediction is useful for guessing unknown values of a variable

- We will develop a model to use for these goals

---

## Context

- A big issue in education is the size of school classes

- Parents often in favour of smaller classes

- More attention paid to individual students  
  
  - Classes easier to control  
  
  - Can do more interactive work  
  
- But, smaller classes are more expensive

- More teaching resources per student 
  
- Important to measure benefit of smaller classes

- Compare against cost to see if worthwhile  
  
- Book repeatedly discusses models in context of class size and student performance

---

## What Are We Trying to Model?

- We want to relate test scores to class size

- Hard to do this for specific individuals

- Many reasons why test scores differ between people
  
  - Even people in same class sizes have very different scores 
  
- Instead focus on the <font color="red">systematic</font> relationship

- We do this by focusing on average test scores

- How do average test scores change with class size?
 
- Several reasons to use the average
  
  - Highlights systematic patterns between variables
  
  - It is mathematically optimal way to predict a variable given another
  
  - Intuitively appealing

---

## What Are We Trying to Model?

.pull-left[
- Mathematically we focus on the .red[**Conditional Expectation**]

- In the context of test scores, the conditional expectation is
$$ E[TestScore | STR] $$

- This is the average test score for each class size
- `$STR$` is Student Teacher Ratio, a measure of class size
]

.pull-right[
.content-box-red[
### Reminder about Expected Values

The **Expected Value** `$E[Y]$` of a random variable `$Y$` is its weighted average

The **Conditional Expectation** `$E[Y|X]$` is the weighted average of a variable `$Y$` at specific values of another variable `$X$`
]
]
---

## What Are We Trying to Model?

- Big problem: we do not know how average test scores relate to class size

- Could be linear
  
  - Could be non-linear
  
  - Could some other weird function
  
- Unfortunately, we will .red[.bolder[never know]] exactly how they relate 😭

- Instead we approximate this relationship

- In EC295 our we use linear models for the approximation

- Often a good guess at true relationship
  
  - But unknown true model is probably more complicated

---

## The Linear Regression Model

- A linear model relating test scores to each class size is
	`$$TestScore = \beta_{0} + \beta_{STR}STR + u$$`

- Several important components of this model
  
  - `$TestScore$` are individual test scores
  
  - STR are individual class sizes
  
  - `$\beta_{STR}$` is the .red[slope] 
	 
      - Effect of one-unit change in class size on test scores
 
  - `$\beta_{0}$` is the .red[intercept] parameter
	 
      - Test scores when class size is zero
    
  - `$u$` is everything except class size that determines test scores
---

## The Linear Regression Model

.pull-left[
This model breaks test scores in to two pieces

1. .red[Population Regression Function]
`$$\beta_{0} + \beta_{STR}STR$$`
  The predictable part of test scores
  
2. .red[Error Term]
`$$u = TestScore - \beta_{0} - \beta_{STR}STR$$`
  The unobserved and unpredictable part of test scores
]

.pull-right[
<img src="SLR_files/figure-html/prf-1.png" width="504" style="display: block; margin: auto;" />
]
---

## The Linear Regression Model

- Another big problem: We do not know the values of `$\beta_{0}$` and `$\beta_{STR}$`

- They are parameters that we do not observe
  
- We also do not observe `$u$`

- The unobserved error term
 
- Suppose we need to know these parameters   
- How do we proceed from here?

- Answer: we .red[**estimate**] `$\beta_{0}$` and `$\beta_{STR}$` with a sample of data

- There are several estimation methods
  
  - We will focus on .red[**Ordinary Least Squares (OLS)**]

---

## Drawing a Sample from the Population 
 
- To estimate our model, we need to collect data on test scores and class sizes

- Imagine collecting a sample of size `$n$`
 
  - e.g. test scores and class sizes from 50 classes in different schools
  
  - `$n = 50$` in this case

- The population regression model holds .red[for each member of the sample]
`$$TestScore_{i} = \beta_{0} + \beta_{STR}STR_{i} + u_{i}$$`
   - The subscript `$i$` identifies a specific member of the sample

- Test scores are assumed to be linearly related to class size for each member of the sample

---

## Ordinary Least Squares

.content-box-blue[
**Ordinary Least Squares**

A method that estimates regression parameters by choosing the ones that minimize the sum of the squared distance between the estimated regression line and each data point
]

- To implement OLS, replace the unknowns of the population model with estimates
`$$TestScore_{i} = \hat{\beta}_{0} + \hat{\beta}_{STR}STR_{i} + \hat{u}_{i}$$`
  - `$\hat{\beta}_{0}$` estimates `$\beta_{0}$`
  
  - `$\hat{\beta}_{STR}$` estimates `$\beta_{STR}$`
  
  - `$\hat{u}_{i}$` is the residual (estimates the error)
		 
- OLS .red[chooses] `$\hat{\beta}_{0}$` and `$\hat{\beta}_{STR}$` to minimize the sum of the squared residual
 
---

## Ordinary Least Squares
- The sum of the squared residual is

`$$\sum_{i=1}^{n} \hat{u}_{i}^{2} =  \sum_{i=1}^{n} (TestScore_{i} - \hat{\beta}_{0} - \hat{\beta}_{1}STR_{i} )^{2}$$`
- To solve, take derivative<sup>1</sup> above with respect to `$\hat{\beta}_{0}$` and `$\hat{\beta}_{1}$` and set to zero

`$$\sum_{i=1}^{n} (TestScore_{i} - \hat{\beta}_{0} - \hat{\beta}_{STR}STR_{i}) = 0$$`
`$$\sum_{i=1}^{n} (TestScore_{i} - \hat{\beta}_{0} - \hat{\beta}_{STR}STR_{i})STR_{i} = 0$$`

- These are the .red[OLS Normal Equations]

.footnotesize[
.footnote[
1. If you don't know calculus, don't worry about it.  I will not ask you to take a derivative in this class.
]]

---

## Ordinary Least Squares

- Use these equations to solve for `$\hat{\beta}_{0}$` and `$\hat{\beta}_{STR}$`

.content-box-blue[
**Ordinary Least Squares Estimators (for our example)**

`$$\hat{\beta}_{0} = \overline{TestScore} - \hat{\beta}_{1}\overline{STR}$$`
`$$\hat{\beta}_{STR} = \frac{\sum_{i=1}^{n}(STR_{i} - \overline{STR})(TestScore_{i} - \overline{TestScore})}{\sum_{i=1}^{n}(STR_{i} - \overline{STR})^2} = \frac{\widehat{cov}(STR_{i}, TestScore_{i})}{\widehat{var}(STR_{i})}$$`
]

- The estimates of the intercept and slope based on our sample

- 💥.red[**Important**]💥: these will differ from one sample to another

- We will return to sampling variation later

---

## Ordinary Least Squares

.pull-left[
The estimated model has its own terminology

1. .red[Sample Regression Function]
`$$\hat{\beta}_{0} + \hat{\beta}_{STR}STR$$`
The line constructed with the OLS estimators

2. .red[Predicted Value]
`$$\widehat{TestScore}_{i} = \hat{\beta}_{0} + \hat{\beta}_{STR}STR_{i}$$`
The value of `$TestScore_{i}$` implied by the sample regression function

3. .red[Residual]
`$$\hat{u}_{i} = TestScore_{i} - \hat{\beta}_{0} - \hat{\beta}_{STR}STR_{i}$$`
The difference between the actual value of `$TestScore_{i}$` and its prediction
]

.pull-right[
<img src="SLR_files/figure-html/srf-1.png" width="504" style="display: block; margin: auto;" />

]
---

## General Model

- So far we have used a specific example

- A population regression function for any outcome and any independent variable is

`$$Y = \beta_{0} + \beta_{1}X + u$$`
.pull-left[.content-box-blue[
**Ordinary Least Squares Estimators**

`$$\hat{\beta}_{0} = \overline{Y} - \hat{\beta}_{1}\overline{X}$$`
`$$\hat{\beta}_{1} = \frac{\sum_{i=1}^{n}(X_{i} - \overline{X})(Y_{i} - \overline{Y})}{\sum_{i=1}^{n}(X_{i} - \overline{X})^2} = \frac{\widehat{cov}(X_{i}, Y_{i})}{\widehat{var}(X_{i})}$$`
]]

.pull-right[.content-box-blue[
**Sample Regression Function**
`$$\hat{\beta}_{0} + \hat{\beta}_{1}X$$`

**Predicted Value**
`$$\widehat{Y}_{i} = \hat{\beta}_{0} + \hat{\beta}_{STR}X_{i}$$`
**Residual**
`$$\hat{u}_{i} = Y_{i} - \hat{\beta}_{0} - \hat{\beta}_{1}X_{i}$$`
]]
---

## Example: The Effect of Class Size on Test Scores
.content-box-green[
- **Question:** .red[Are class size and student achievement related?]

- We will create .red[simulated data] to explore the relationship

- We set the process generating the data
	
	- Lets us control the true values of the parameters
	
	- We set these values to create realistic data

- The simulated data will mimic actual data we see on test scores

- We will use this dataset to explore linear regression

- We will see mechanics of estimation
	
	- Also how sampling variation affects estimates
]
---

## Example: The Effect of Class Size on Test Scores

.content-box-green[
- Suppose the population regression function is
`$$TestScore_{i}= \beta_{0} + \beta_{1}STR_{i} + u_{i}$$`
  - `$\beta_{1}$` is effect of one more student per teacher
  
  - `$\beta_{0}$` is test score when class size is zero
  
      - Does not have a useful interpretation in this example

- `$u$` are determinants of test scores other than student-teacher ratio

- Natural ability
  
  - Student background
  
  - School/teacher quality
  
  - etc
]
---

## Example: Effect of Class Size on Test Scores

.content-box-green[
.pull-left[
- Set the population regression equation as 
`$$TestScore= 700 - 2*STR + u$$`
  - Says that  `$\beta_{0} = 700$`, `$\beta_{1} = -2$`

- These are .red[fictional] population values

- In reality we would never know these
      
      - We are pretending we know them for instructional reasons
]

.pull-right[
<img src="SLR_files/figure-html/ex1-1.png" width="504" style="display: block; margin: auto;" />
]
]
---

## Example: Effect of Class Size on Test Scores

.content-box-green[
.pull-left[
- Next step is to estimate `$\beta_{0}$` and `$\beta_{1}$`

- As though we did not know their values
 
- First take sample of data from population

- We will draw .red[420 observations] with a .red[simple random sample]

- Stata code on right 
]
	
 .pull-right[
**Stata Code**

```stata
clear  
set obs 420  
set seed 12345  
      
gen str = rnormal(20,2)  
gen u = rnormal(0,20)  
      
gen testscr = 700 -2 * str + u 
```
]     
 
]
---

## Example: Effect of Class Size on Test Scores

.content-box-green[
- Before estimating parameters, summarize the data

**Stata Code and Output**

```stata
sum testscr str
```

```
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
     testscr |        420    659.1345    20.67156    593.118   713.0748
         str |        420    20.13071    2.103167    14.2861   27.30753
```

- Note scale of test scores
		 
  - Simulate scores from a standardized test
  
  - Standardized tests often scaled to have mean 650, standard deviation 20

- Roughly 20 students per teacher in these fictional districts
]
---

## Example: Effect of Class Size on Test Scores

.content-box-green[
- Estimate intercept and slope by OLS

**Stata Code and Output**

```stata
regress testscr str
```

```
      Source |       SS           df       MS      Number of obs   =       420
-------------+----------------------------------   F(1, 418)       =     15.45
       Model |  6383.10498         1  6383.10498   Prob > F        =    0.0001
    Residual |  172661.265       418  413.065226   R-squared       =    0.0357
-------------+----------------------------------   Adj R-squared   =    0.0333
       Total |  179044.369       419  427.313531   Root MSE        =    20.324

------------------------------------------------------------------------------
     testscr | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         str |  -1.855817    .472094    -3.93   0.000    -2.783791   -.9278429
       _cons |   696.4934    9.55519    72.89   0.000     677.7112    715.2756
------------------------------------------------------------------------------
```
]

---

## Example: Effect of Class Size on Test Scores

.content-box-green[
- The OLS estimates are
`$$\hat{\beta}_{1} = -1.86$$`
`$$\hat{\beta}_{0} = 696.49$$`
		 
- The sample regression function is
`$$\widehat{TestScore}= 696.49 - 1.86STR$$`

- Use to generate predictions of test scores
  
  - Simply plug in a value for `$STR$`, and compute `$\widehat{TestScore}$`
]

---

## Example: Effect of Class Size on Test Scores

.content-box-green[
**Stata Code**   
<style>
pre {
    white-space: pre-wrap;
}

.figure2 {
  margin-top: -3em;
  margin-bottom: -1.5em;
  margin-left: 0px;
  margin-right: 0px;
}
</style>

```stata
predict fitted, xb
twoway (scatter testscr str)(line fitted str), title(Test Scores and Student Teacher Ratio) subtitle(Fitted Values and Actual Data)
```

```

```
.center[.figure2[
<figure>
  <img src="scatter.svg"
  width="47%">
</figure>
]
]
]
---

## Example: Effect of Class Size on Test Scores

.content-box-green[
**Stata Code**   
<style>
pre {
    white-space: pre-wrap;
}

.figure2 {
  margin-top: -3em;
  margin-bottom: -1.5em;
  margin-left: 0px;
  margin-right: 0px;
}
</style>

```stata
predict resid, residual
twoway (scatter resid str), title(Test Scores and Student Teacher Ratio) subtitle(Residuals)
```

```

```
.center[.figure2[
<figure>
  <img src="resid.svg"
  width="47%">
</figure>
]
]
]
---

# Measures of Fit
## Introduction

- OLS is one way to estimate a linear regression model

- It is important to know how well the method works

- One way is to examine the .red[fit] of our regression line

- How close to the line are the datapoints?
  
  - Does `$X$` explain a large fraction of variation in `$Y$`?

- These are the .red[algebraic properties] of our estimator

- Mathematical relationships hold true **in each sample**

- Different from the .red[statistical properties]
		 
  - The behaviour of estimators **across repeated samples**
  
  - Necessarily hypothetical because we only have one sample

---

# Measures of Fit
## R-Squared

- The .red[Coefficient of Determination] `$R^2$` measures the fraction
    of the variation in `$y$` that is explained by the independent
    variables `$$R^2 = \frac{ESS}{TSS}$$`

- TSS is the .red[Total Sum of Squares]
    `$$TSS = \sum_{i=1}^{N} (Y_{i} - \bar{Y})^2$$`

-   A measure of the spread in the `$Y_{i}$`
    
---

# Measures of Fit

-   ESS is the .red[Explained Sum of Squares]
    `$$ESS = \sum_{i=1}^{N} (\hat{Y}_{i} - \bar{Y})^2$$`

-   And the .red[Residual Sum of Squares (SSR)] is
    `$$SSR= \sum_{i=1}^{N} (\hat{u}_{i})^2$$`

-   `$R^2$` ranges between 0 and 1

-   `$R^2 = 0$` means that `$X$` explains none of the variation in `$Y$`

-   Scatterplot between `$Y$` and `$X$` is a cloud with no obvious
            linear relationship

-   `$R^2 = 1$` means that `$X$` explains all of the variation in `$Y$`

-   Data in scatterplot between `$Y$` and `$X$` fall along a
            straight line
            
---

# Measures of Fit

-   `$R^2$` is also equal to the square of correlation coefficient between
    `$y_{i}$` and `$\hat{y}_{i}$`

-   `$R^2 = 1$` is perfect correlation between prediction and actual
        values

-   An important relationship between sums of squares is
    `$$TSS=  ESS + SSR$$`

-   Part of any movement of `$y_{i}$` away from its average is
        explainable by factors in the regression

-   Other part is related to unobserved factors

-   As a result, you can reexpress
    `$$R^2 = \frac{ESS}{TSS} = 1- \frac{SSR}{TSS}$$`
    
---

# Measures of Fit

-   Important to be cautious when using `$R^2$`

-   In real applications, `$R^2$` is often very low

-   Does not mean regression is bad

-   Just means we have not captured all factors that explain `$Y$`

-   A low `$R^2$` does not imply a poor estimate of `$\beta_{1}$`

-   `$\beta_{1}$` measures effect on `$Y$` from changing `$X$`, all else
        equal

-   `$R^2$` measures fraction of total variation in `$Y$` is explained
        by `$X$`

-   Concepts are independent of each other

-   In class size example `$R^2 = 0.036$`

-   Many other factors besides student-teacher ratio explain test
        scores
  
---

# Measures of Fit
## Standard Error of Regression (SER)

-   Can also measure fit with spread of data around regression line

-   The residual `$\hat{u}_{i}$` is deviation of `$Y_{i}$` from prediction
    `$$\hat{u}_{i} = Y_{i} - \hat{Y}_{i}$$`

-   The .red[standard error of regression (SER)] is the standard deviation
    of `$\hat{u}_{i}$`

-   The average distance of `$Y_{i}$` from its prediction
        `$\hat{Y}_{i}$`

`$$SER = s_{\hat{u}} = \sqrt{\frac{1}{n-2}\sum_{i=1}^{n}\hat{u}_{i}^2} = \sqrt{\frac{SSR}{n-2} }$$`
    
---

# Example
.content-box-green[
-   Recall the regression output from earlier

```stata
regress testscr str
```

]

---
# Example

.content-box-green[
- The sums of squares are

-   `$ESS = 6383.10$`
  
  -   `$SSR = 172661.27$`
  
  -   `$TSS = 179044.37$`

-   `$R^2 = 0.056$` is in the top right corner

-   You can verify that

-   `$SST = SSE + SSR$`

-   `$R^2 = \frac{SSE}{SST}$`

-   The SER is called the .red[Root MSE (Mean Square Error)] in the output

-   From the output `$SER = 20.32$`
]

---

# Least Squares Assumptions for Causal Inference

- So far we have defined `$\beta_{1}$` only as the **slope**

- The slope could be two things

1. The (standardized) .red[correlation] between `$X$` and `$Y$`
  
      - What happens to `$Y$` when we change `$X$`?
  
  2. The .red[causal effect] of `$X$` on `$Y$`
  
      - What happens to `$Y$` when we change `$X$` and **nothing else that affects Y changes**
      
- In many applications we want the causal effect

- What happens to my income if I get a university degree?
  
  - How does getting a COVID shot affect the likelihood of infection?
  
- In this section we establish what needs to be true for OLS to estimate a causal effect

---

# Least Squares Assumptions for Causal Inference

.content-box-red[
.pull-left[
**Correlation Example**

- Regression of Income on Schooling with .red[observational data]

`$$Inc = \beta_{0} + \beta_{1}Schl + u$$`
- `$\beta_{1}$` shows how income changes with schooling

- Probably represents only a correlation

- People with more schooling were already smarter
  
  - Would have earned more even without schooling
  
- Slope reflects partly effect of schooling, partly effect of intelligence  
]

.pull-right[
**Causation Example**

- Regression of test scores on class size when students .red[randomly assigned to classes]

`$$TestScore = \beta_{0} + \beta_{1}ClassSize + u$$`
- `$\beta_{1}$` shows how bigger classes affect scores

- Probably a causal effect because

- Randomization of class size means it is unrelated to other factors
  
  - Students in big classes are no different from those in small ones
  
- Slope reflects only independent effect of class size on scores  
]

]

---

# Least Squares Assumptions for Causal Inference

- For OLS to estimate the .red[causal effect] the following things need to be true

.content-box-blue[
**Assumptions for Causal Inference**

The model relating `$Y$` to `$X$` is

`$$Y = \beta_{0} + \beta_{1}X + u$$`

where `$\beta_{1}$` is explicitly defined as the causal effect, **and**:

1. The error `$u$` is not systematically related to `$X$` on average: 
`$$E[u|X]=0$$`
2. `$(X_{i}, Y_{i})$` are independent and identically distributed (iid)

3. Large outliers are unlikely

]

---

# Least Squares Assumptions for Causal Inference
## Assumption 1: Zero Conditional Mean of the Error

-   The average error term `$u_{i}$`, conditional on `$X_{i}$`, is zero

`$$E[u_{i}|X_{i}] = 0$$`

-   Means that unobserved factors are unrelated to the independent
    variable

-   No linear or non-linear relationship between the two

-   Zero correlation and covariance between `$u_{i}$` and `$X_{i}$`

-   Intuitively, at each `$X_{i}$` positive and negative errors tend to
    average out to zero

-   Assumption implies the population regression function accurately
    describes the conditional mean of `$Y_{i}$`

-   Average `$Y_{i}$` is linearly related to `$X_{i}$`
    
---

# Least Squares Assumptions for Causal Inference

-   Why do we need to assume `$E[u_{i}|X_{i}] = 0$`?

-   It allows us to claim `$\hat{\beta}_{1}$` is .red[unbiased]

-   Average of `$\hat{\beta}_{1}$` over repeated samples equals
        `$\beta_{1}$`

-   When `$\beta_{1}$` is the causal effect and `$\hat{\beta}_{1}$` is an unbiased estimate of it, we can infer causality

-   `$E[u_{i}|X_{i}] = 0$` means no unobserved factors change
        systematically with `$X_{i}$`

-   When this is true, `$\hat{\beta}_{1}$` estimates the causal effect of `$X_{i}$` on `$Y_{i}$`

-   This is an **assumption**

-   We will never know for sure if it is true

-   Best we can do is assess whether we think it is reasonable

-   Most of the time, it is probably not (we will discuss later in the
        course)

---

# Least Squares Assumptions for Causal Inference

.pull-left[
**OLS Estimates Unbiased Causal Effect**
<img src="SLR_files/figure-html/unnamed-chunk-8-1.png" width="504" style="display: block; margin: auto;" />
]
.pull-right[
**OLS Estimates Biased Effect**
<img src="SLR_files/figure-html/unnamed-chunk-9-1.png" width="504" style="display: block; margin: auto;" />

]

---

# Least Squares Assumptions for Causal Inference
## Assumption 2: `$\small{(X_{i},Y_{i})}$` are iid

-   When sampling, we draw both `$X_{i}$` and `$Y_{i}$` for each person

-   Assumption is they are independent, and have the same distribution
    across people

-   If we have a simple random sample, this will be true

-   Observations come from same population

-   Chosen so that everyone has same chance of being in sample

-   Then one pair `$(X_{i},Y_{i})$` gives no info about other
        `$(X_{i},Y_{i})$`

-   Each `$(X_{i},Y_{i})$` has same distribution

-   Assumption sometimes fails with different sampling schemes

-   Ex: time series and panel data

---

# Least Squares Assumptions for Causal Inference
## Assumption 3: Large Outliers Unlikely

.pull-left[
-   .red[Outlier]: an observation on `$X$` or `$Y$` far outside usual range of
    data

-   OLS estimators are  sensitive to outliers

- Regression line on right is flat without outlier

- Regression line tilts up significantly with one outlier
    ]

.pull-right[
<img src="SLR_files/figure-html/unnamed-chunk-10-1.png" width="504" style="display: block; margin: auto;" />
]

---

# Least Squares Assumptions for Causal Inference

-   Outliers happen for several reasons

-   Data entry error

-   Recording height in cm instead of inches for 1 observation

-   Accidentally shifting decimal place

-   Entering a totally wrong value

-   Naturally occurring issues that are not errors

-   One large country in sample of small countries

-   One big donor in sample of charitable giving

-   Important to check data for outliers

-   Examine summary statistics before doing regression

-   E.g. mean, standard deviation, max, min, iqr, etc.
    
    
    
---

# Sampling Distribution of OLS Estimators    
## Introduction

-   The estimator `$\hat{\beta}_{1}$` is a quantity computed from a sample

-   Its value therefore varies from sample to sample

-   It is a .red[random variable]

-   The sampling distribution of `$\hat{\beta}_{1}$` describes the
    likelihood of values it can take across random samples

-   The sampling distribution helps us test claims about
    `$\beta_{1}$` through hypothesis tests

-   For hypothesis tests, we need to know the sampling distribution

-   In this section we derive it using our assumptions

---

# Sampling Distribution of OLS Estimators 
## The Mean of `$\small{\hat{\beta}_{1}}$`

-   Like all random variables, `$\hat{\beta}_{1}$` has a mean and variance

-   We compute these values as part of the description of the sampling
    distribution

-   To compute the mean, start with the formula for `$\hat{\beta}_{1}$`

`$$\hat{\beta}_{1} = \frac{\sum_{i=1}^{n}(X_{i} - \bar{X})(Y_{i} - \bar{Y})}{\sum_{i=1}^{n}(X_{i} - \bar{X})^2}$$`

-   First step is to rearrange the formula

-   Rewrite numerator as
    `$$\sum_{i=1}^{n}(X_{i} - \bar{X})(Y_{i} - \bar{Y}) = \sum_{i=1}^{n}(X_{i} - \bar{X})(\beta_{1}(X_{i} - \bar{X}) + u_{i} - \bar{u}))$$`

---

# Sampling Distribution of OLS Estimators

-   Multiplying out the brackets
    `$$= \sum_{i=1}^{n}(\beta_{1}(X_{i} - \bar{X})^2 + (X_{i} - \bar{X})(u_{i} - \bar{u}))$$`
    `$$= \beta_{1}\sum_{i=1}^{n}(X_{i} - \bar{X})^2 +\sum_{i=1}^{n} (X_{i} - \bar{X})(u_{i} - \bar{u})$$`

-   The last term can be simplified
    `$$\sum_{i=1}^{n} (X_{i} - \bar{X})(u_{i} - \bar{u}) = \sum_{i=1}^{n} (X_{i} - \bar{X})u_{i} - \sum_{i=1}^{n} (X_{i} - \bar{X})\bar{u}$$`
    `$$= \sum_{i=1}^{n} (X_{i} - \bar{X})u_{i}$$`
    
    
---

# Sampling Distribution of OLS Estimators

-   Because
    `$$\bar{u}\sum_{i=1}^{n}(X_{i} - \bar{X}) =\bar{u}(\sum_{i=1}^{n}X_{i} - \sum_{i=1}^{n}\bar{X})$$`
    `$$=\bar{u}(\sum_{i=1}^{n}X_{i} - n\bar{x})=\bar{y}(\sum_{i=1}^{n}X_{i} - n\frac{1}{n}\sum_{i=1}^{n}X_{i})$$`
    `$$=\bar{u}(\sum_{i=1}^{n}X_{i} - \sum_{i=1}^{n}X_{i})  =  0$$`

-   Putting this all together
    `$$\hat{\beta}_{1} = \frac{\beta_{1}\sum_{i=1}^{n}(X_{i} - \bar{X})^2 +\sum_{i=1}^{n} (X_{i} - \bar{X})u_{i}}{\sum_{i=1}^{n}(X_{i} - \bar{X})^2}$$`
    `$$\hat{\beta}_{1} = \beta_{1}+\frac{\sum_{i=1}^{n} (X_{i} - \bar{X})u_{i}}{\sum_{i=1}^{n}(X_{i} - \bar{X})^2}$$`
    
---

# Sampling Distribution of OLS Estimators

-   The estimator `$\hat{\beta}_{1}$` is the sum of two things

-   The parameter it is estimating

-   A weighted sum of the (unknown) errors

-   The expected value of `$\hat{\beta}_{1}$` is then
    `$$E[\hat{\beta}_{1}|X_{i}]= E \left [ \beta_{1} + \frac{\sum_{i=1}^{n}(X_{i} - \bar{X})u_{i}}{\sum_{i=1}^{n}(X_{i} - \bar{X})^2} | X_{i} \right ]$$`
    `$$= E[\beta_{1} |X_{i}] + E \left [ \frac{\sum_{i=1}^{n}(X_{i} - \bar{X})u_{i}}{\sum_{i=1}^{n}(X_{i} - \bar{X})^2} | X_{i} \right ]$$`
    `$$=\beta_{1}  +  \frac{\sum_{i=1}^{n}(X_{i} - \bar{X})E[u_{i}|X_{i}]}{\sum_{i=1}^{n}(X_{i} - \bar{X})^2}$$`
    
---

# Sampling Distribution of OLS Estimators

-   Our first assumption is that `$E[u_{i}|X_{i}]=0$`, so
    `$$E[\hat{\beta}_{1}|X_{i}]=\beta_{1}$$`

-   For a given value of `$X_{i}$`, the average of `$\hat{\beta}_{1}$` is
    `$\beta_{1}$`

-   To find the **overall** average, use the law of iterated expectations
    `$$E[\hat{\beta}_{1}] = E[E[\hat{\beta}_{1}|X_{i}]]$$`

---

# Sampling Distribution of OLS Estimators

-   Substituting in `$E[\hat{\beta}_{1}|X_{i}]=\beta_{1}$`
    `$$E[\hat{\beta}_{1}] = E[\beta_{1} ]  = \beta_{1}$$`

-   Intuition: Since the average at each `$X_{i}$` is zero, the overall
      average is also zero

-The resulting mean of the OLS estimator is

.content-box-blue[
**Mean of the OLS Estimator**
`$$E[\hat{\beta}_{1}] = \beta_{1}$$`
]

---

# Sampling Distribution of OLS Estimators      
    
-   `$E[\hat{\beta}_{1}]   = \beta_{1}$` means that `$\hat{\beta}_{1}$` is .red[unbiased]

-   Why is this important?

-   **If we could repeatedly sample** the average of
        `$\hat{\beta}_{1}$` would be `$\beta_{1}$`

-   The only reason `$\hat{\beta}_{1}$` differs from `$\beta_{1}$` **in
        any one sample** is sampling error

-   A sample does not always match the population

-   Unbiased estimators are preferable to biased estimators

-   Biased estimators differ from parameter it is estimating because of
            sampling error **and** because it is systematically wrong

-   Statisticians will generally prefer an unbiased
        estimator

-   If `$\beta_{1}$` is the causal effect and `$\hat{\beta}_{1}$` is an unbiased estimate of it, we can attribute causality to
    the estimated relationship between `$X_{i}$` and `$Y_{i}$`    
    
    
---

# Sampling Distribution of OLS Estimators
## Variance of `$\small{\hat{\beta}_{1}}$`

-   The expected value tells us the middle of the distribution

-   We also need to know how spread out the values of `$\hat{\beta}_{1}$`
    are from the mean across samples

-   The key measure of this is the variance

-   Start with the alternate formula for `$\hat{\beta}_{1}$` we derived
    above

`$$\hat{\beta}_{1}=\beta_{1}  +  \frac{\sum_{i=1}^{n}(X_{i} - \bar{X})u_{i}}{\sum_{i=1}^{n}(X_{i} - \bar{X})^2}$$`

---

# Sampling Distribution of OLS Estimators

-   Rewrite the denominator using the sample variance of `$X_{i}$`
    `$$\hat{\beta}_{1}=\beta_{1}  +  \frac{\sum_{i=1}^{n}(X_{i} - \bar{X})u_{i}}{(n-1)s_{X}^2}$$`

-   where `$s_{X}^2 = \frac{\sum_{i=1}^{n}(X_{i} - \bar{X})^2}{n-1}$`

-   Multiply numerator and denominator by `$\frac{1}{n}$`
    `$$\hat{\beta}_{1}=\beta_{1}  +  \frac{\frac{1}{n}\sum_{i=1}^{n}(X_{i} - \bar{X})u_{i}}{(\frac{n-1}{n})s_{X}^2}$$`

-   From this point forward, we assume that we have a large sample

-   With large samples, estimators are very close to parameters

-   So `$\bar{X} \approx \mu_{X}$` and `$s_{X}^2 \approx \sigma_{X}^2$`

-   Also, `$\frac{n-1}{n} \approx 1$`
    
---

# Sampling Distribution of OLS Estimators

-   Substitute these values into the formula
    `$$\hat{\beta}_{1}=\beta_{1}  +  \frac{\frac{1}{n}\sum_{i=1}^{n}(X_{i} - \mu_{X})u_{i}}{\sigma_{X}^2}$$`

-   Now use the variance operator
    `$$VAR(\hat{\beta}_{1})=VAR\left(\beta_{1}  +  \frac{\frac{1}{n}\sum_{i=1}^{n}(X_{i} - \mu_{X})u_{i}}{\sigma_{X}^2}\right)$$`

-   Since `$\beta_{1}$` is a fixed parameter,
    `$$VAR(\hat{\beta}_{1})=VAR\left( \frac{\frac{1}{n}\sum_{i=1}^{n}(X_{i} - \mu_{X})u_{i}}{\sigma_{X}^2}\right)$$`
    
    
---

# Sampling Distribution of OLS Estimators

-   We will now make heavy use of the properties of variance

-   Because `$\sigma_{X}^2$` is a fixed constant
    `$$VAR(\hat{\beta}_{1})=\frac{1}{(\sigma_{X}^2)^2}VAR\left( \frac{1}{n}\sum_{i=1}^{n}(X_{i} - \mu_{X})u_{i}\right)$$`

-   Because `$\frac{1}{n}$` is a fixed constant
    `$$VAR(\hat{\beta}_{1})=\frac{1}{(\sigma_{X}^2)^2 n^2}VAR\left( \sum_{i=1}^{n}(X_{i} - \mu_{X})u_{i}\right)$$`

-   Finally, because `$X_{i}$` and `$u_{i}$` are unrelated
    `$$VAR(\hat{\beta}_{1})=\frac{n}{(\sigma_{X}^2)^2 n^2}VAR\left( (X_{i} - \mu_{X})u_{i}\right)$$`
    
    
---

# Sampling Distribution of OLS Estimators

-   Simplifying, we have the final variance formula

.content-box-blue[
**Variance of OLS Estimator**
`$$VAR(\hat{\beta}_{1})=\frac{VAR\left( (X_{i} - \mu_{X})u_{i}\right) }{n(\sigma_{X}^2)^2}$$`]

-   Important things to note about the spread of `$\hat{\beta}_{1}$`

-   The larger is `$n$`, the smaller is the variance

-   More data reduces sampling variation

-   The larger is `$\sigma_{X}^2$`, the smaller is the variance

-   When `$X_{i}$` is more spread out, it is easier to estimate
            the linear relationship

-   A larger spread in `$u_{i}$` increases the variance

-   When `$u_{i}$` is more spread out, dots fall further from the
            estimated line

-   The slope becomes less precise
        
---

# Sampling Distribution of OLS Estimators
## The Distribution of `$\small{\hat{\beta}_{1}}$`

-   We know the mean and variance of the distribution of
    `$\hat{\beta}_{1}$`

-   What about the shape?

-   If we assume a big sample we can apply the .red[Central Limit Theorem (CLT)]

-   The sum of independent random variables from the same population
        is approximately Normally distributed

-   `$\hat{\beta}_{1}$` is an average
    `$$\hat{\beta}_{1}=\beta_{1}  +  \frac{\frac{1}{n}\sum_{i=1}^{n}(X_{i} - \mu_{X})u_{i}}{\sigma_{X}^2}=\beta_{1}  +  \frac{\frac{1}{n}\sum_{i=1}^{n}v_{i} }{\sigma_{X}^2}$$`

---

# Sampling Distribution of OLS Estimators   
   
-   Central Limit Theorem says `$\hat{\beta}_{1}$` has a Normal
    distribution 
    
- We previously derived the mean and variance

- This gives us the distribution of the OLS estimator

.content-box-blue[
**Distribution of OLS Estimator**
`$$\hat{\beta}_{1} \sim  \mathcal{N}\left( \beta_{1},\frac{VAR\left( (X_{i} - \mu_{X})u_{i}\right) }{n(\sigma_{X}^2)^2} \right)$$`
]

---

# Example

.content-box-green[
.pull-left[
- Simulate the sampling distribution of `$\hat{\beta}_{1}$`

- Code to the right:

- Assumes model is
`$$TestScore= 700 - 2*STR + u$$`

- Draws 420 observation on `$Y$` and `$X$` 
  
  - Computes `$\hat{\beta}_{1}$` based on sample
  
  - Repeats this 9999 times
  
  - Plots distribution of 9999 `$\hat{\beta}_{1}$` values
]

.pull-right[

```stata
clear all
local sims = 9999
set obs `sims'
set more off
gen beta1 = .

forvalues x = 1/`sims' {
	preserve
	clear
	qui set obs 420
	gen str = rnormal(20,2)
	gen u = rnormal(0,20)
	gen testscr = 700 -2 * str + u
	qui regress testscr str
	restore	
	qui replace beta1 = _b[str] in `x'
}
```
]
]

---

# Example

.content-box-green[

```stata
twoway hist beta1, title(Sampling Distribution of Beta1) scheme(s2mono)
```
.center[
<figure>
  <img src="dist.svg"
  width="47%">
</figure>
]
]

---

# Example
.pull-left[
<img src="SLR_files/figure-html/unnamed-chunk-13-1.gif" style="display: block; margin: auto;" />
]

.pull-right[
<img src="SLR_files/figure-html/unnamed-chunk-14-1.png" width="504" style="display: block; margin: auto;" />

]