Instrumental Variables

EC655

Justin Smith

Wilfrid Laurier University

Fall 2022

Introduction

In many empirical applications we want to estimate a causal effect
- The slope in a structural model
- The independent effect of a particular variable on the outcome
OLS is inconsistent for this parameter when the error is correlated with $\mathbf{x}$
- This happens often with omitted variables
- There are several other situations
  - Simultaneous equations
  - Measurement error in $\mathbf{x}$
If we want to consistently estimate causal parameter, we need a different method
First, some important terminology
- Variables in a regression model that are correlated with the error are endogenous
- Those that are uncorrelated with the error are exogenous

Introduction

We will learn several methods to uncover a causal effect with endogenous variables
- Instrumental Variables
- Difference in Differences
- Regression Discontinuity
Each takes a different approach to the problem
One commonality is that many are usually based on natural experiments
- Individuals are exposed to experimental conditional by outside factors
- Some examples
  - Vietnam draft lottery and effect of military service on wages
  - 2008 Beijing air pollution reduction and birthweight
  - Maimonides’ rule and effect of class size on achievement

Instrumental Variables

Model

Start with the population regression function

\[y = \mathbf{x}\boldsymbol{\beta} + u\]

\[= \mathbf{x_{1}}\boldsymbol{\beta_{1}} + x_{k}\beta_{k} + u\]
Suppose that all $\mathbf{x_{1}}$ are uncorrelated with $u$
- They are exogenous
But we are worried that $x_{k}$ is correlated with $u$
- It is endogenous
We saw that OLS leads to inconsistent estimates of $\boldsymbol{\beta}$

Model

Now imagine we have variable $z_{1}$ that is
- Uncorrelated with $u$
- Correlated with $x_{k}$
Define the vector $\mathbf{z}$ to be

\[\mathbf{z} = [1,x_{1}, x_{2}, ...,x_{k-1},z_{1}]\]
- It is the $\mathbf{x}$ vector with $z_{1}$ in place of $x_{k}$
- This is the vector of all exogenous factors
  - $\mathbf{x_{1}}$ are included instruments
  - $z_{1}$ is the excluded instrument because it comes from outside the model

Model

Since we assume all factors in $\mathbf{z}$ are exogenous, we can write

\[\mathbf{E}(\mathbf{z}'u) = 0\]
If we take our model

\[y = \mathbf{x}\boldsymbol{\beta} + u\]
Pre-multiply by $\mathbf{z}$ and take expectations

\[\mathbf{E}(\mathbf{z}'y) = \mathbf{E}(\mathbf{z'x}) \boldsymbol{\beta} + \mathbf{E}(\mathbf{z}'u)\]
Using the fact that $\mathbf{E}(\mathbf{z}'u) = 0$

\[\boldsymbol{\beta} = [\mathbf{E}(\mathbf{z'x})] ^{-1}\mathbf{E}(\mathbf{z}'y)\]

Model

To be able to identify $\boldsymbol{\beta}$ as we did above, we need to assume
- $\text{rank }\mathbf{z'z} = K+1$
  - The variables in $\mathbf{z}$ are linearly independent
  - If this is not true we have perfect multicollinearity, and cannot compute $\boldsymbol{\beta}$
- $\text{rank }\mathbf{z'x} = K+1$
  - The instrument is correlated with the endogenous variable
  - This allows us to invert $\mathbf{E}(\mathbf{z'x})$
Combined with the orthogonality condition this means an instrument must be
- Uncorrelated with the error term
- Correlated with the endogenous variable
If either assumption fails, we cannot use IV

Estimation by Method of Moments

Applying the method of moments to the population slope

\[\boldsymbol{\hat{\beta}}=\left ( \frac{1}{n}\sum_{i=1}^{n}\mathbf{z_{i}'x_{i}} \right )^{-1} \left ( \frac{1}{n} \sum_{i=1}^{n}\mathbf{z}_{i}'y_{i}\right ) = \left ( \sum_{i=1}^{n}\mathbf{z_{i}'x_{i}} \right )^{-1} \left ( \sum_{i=1}^{n}\mathbf{z}_{i}'y_{i}\right )\]

\[=\left ( \mathbf{Z'X} \right )^{-1} \left ( \mathbf{Z'y}\right )\]
The IV estimator is very similar to the OLS estimator, except
- The $\mathbf{X'}$ matrix that premultiplies the $\mathbf{X}$ matrix and $\mathbf{y}$ vector is replaced with $\mathbf{Z'}$

Two-Stage Least Squares

Model

In some cases we have more than one instrument for an endogenous variable
Two-Stage Least Squares (TSLS) generalizes the IV model for this situation
- Note: In practice, both models are often called Instrumental Variables
The TSLS procedure is
- First Stage: regress endogenous variable on all exogenous variables
- Second Stage: regress dependent variable on exogenous $\mathbf{x}$ variables and predicted value of endogenous variable from first stage
This process keeps only exogenous part of endogenous variable in the second stage
- Purges the endogenous variable of the endogenous component

Model

To see this mathematically, imagine as before that

\[y = \mathbf{x_{1}}\boldsymbol{\beta_{1}} + x_{k}\beta_{k} + u\]
- $x_{k}$ is endogenous, and $\mathbf{x_{1}}$ are exogenous
- This is a structural equation for $y$, because it measures causal effects
The vector $\mathbf{z}$ is now

\[\mathbf{z} = [1,x_{1}, x_{2}, ...,x_{k-1},z_{1}, z_{2},...,z_{m}]\]
- Contains $K$ exogenous $\mathbf{x}$ variables and $M$ instruments
- Total dimension of this vector is $L=K+M$
There are at least as many instruments as endogenous variables
- In this example, there is 1 endogenous variable, so $M \ge 1$

Model

When $\mathbf{z}$ is exogenous, $\mathbf{E}(\mathbf{z}'u) = 0$
If $\mathbf{z}$ is uncorrelated with $u$, then so is any linear combination of $\mathbf{z}$
The reduced form linear relationship between $\mathbf{x}$ and $\mathbf{z}$ is

\[\mathbf{x} = \mathbf{z}\boldsymbol{\Pi} + r\]
- This is a regression of each element of $\mathbf{x}$ on $\mathbf{z}$
- $\boldsymbol{\Pi}$ is a $L\times K+1$ matrix
  - Each column is the coefficients from regressing an element of $\mathbf{x}$ on $\mathbf{z}$
  - Like doing a separate regression of each element in $\mathbf{x}$ on $\mathbf{z}$ and collecting all coefficient vectors into a big matrix
- Reduced form means all the regressors are exogenous, but it does not necessarily measure causal effects

Model

The population regression function in this relationship is

\[\mathbf{x}^{*} = \mathbf{z}\boldsymbol{\Pi}\]
- This is the first stage regression
When you regress a variable on itself, the predicted value is itself
- Even if there are other variables in the model
So the vector $\mathbf{x}^{*}$ is

\[\mathbf{x}^{*} = [1,x_{1}, x_{2}, ...,x_{k-1},x_{k}^{*}]\]
where $x_{k}^{*} = \pi_{0} + \pi_{1}x_{1} + \pi_{2}x_{2} + ...+\pi_{k-1}x_{k-1} + \pi_{k+1}z_{1} + ... +\pi_{k+m}z_{m}$
For the first stage, we only need to regress the endogenous variables on $\mathbf{z}$

Model

As noted, a linear combination of $\mathbf{z}$ is uncorrelated with $u$, so

\[\mathbf{E}(\mathbf{x}^{*'}u) = 0\]
Premultiplying our structural equation by $\mathbf{x}^{*'}$ and taking expectations

\[\mathbf{E}(\mathbf{x}^{*'}y) = \mathbf{E}(\mathbf{x^{*'}x}) \boldsymbol{\beta} + \mathbf{E}(\mathbf{x}^{*'}u)\]
Using $\mathbf{E}(\mathbf{x}^{*'}u) = 0$

\[\boldsymbol{\beta} = \left(\mathbf{E}(\mathbf{x^{*'}x})\right)^{-1}\mathbf{E}(\mathbf{x}^{*'}y)\]
This is the population TSLS slope
- We will cover estimation shortly

Model

You can also write this estimator as

\[\boldsymbol{\beta} = \left(\mathbf{E}(\mathbf{x^{*'}x^{*}})\right)^{-1}\mathbf{E}(\mathbf{x}^{*'}y)\]
- This is the slope from a regression of $y$ on $\mathbf{x^{*}}$
- It is the slope from the second stage
To see how, recall that \[\mathbf{x}= \mathbf{z}\boldsymbol{\Pi} + r= \mathbf{x}^{*} + r\]
Pre-multiply by $\mathbf{x}^{*'}$ and take expectations \[\mathbf{E}(\mathbf{x}^{*'} \mathbf{x})= \mathbf{E}(\mathbf{x}^{*'} \mathbf{x}^{*}) + \mathbf{E}(\mathbf{x}^{*'} r)\]
Since by definition $\mathbf{E}(\mathbf{x}^{*'} r) = 0$ \[\mathbf{E}(\mathbf{x}^{*'} \mathbf{x})= \mathbf{E}(\mathbf{x}^{*'} \mathbf{x}^{*})\]

Model

Finally, note that the IV estimator is a special case of 2SLS
Assume we have one endogenous variable and one excluded instrument
- This means that $\mathbf{x}$ and $\mathbf{z}$ are both $K+1$ matrices
- The matrix $\mathbf{\Pi}$ is $K+1 \times K+1$
Substitute the definition of $\mathbf{x}^{*'}$ into the formula for $\boldsymbol{\beta}$ \[\boldsymbol{\beta} = \left(\mathbf{E}(\mathbf{(z\Pi)'x})\right)^{-1}\mathbf{E}((\mathbf{z\Pi)'}y)\]
Distributing the transpose and pulling $\mathbf{\Pi}$ out of the expectation \[\boldsymbol{\beta} = \left(\mathbf{\Pi'}\mathbf{E}\mathbf{(z'x})\right)^{-1}\mathbf{\Pi'}\mathbf{E}(\mathbf{z'}y) = \left(\mathbf{E}\mathbf{(z'x})\right)^{-1}(\mathbf{\Pi'})^{-1}\mathbf{\Pi'}\mathbf{E}(\mathbf{z'}y)\] \[\boldsymbol{\beta} =\left(\mathbf{E}(\mathbf{z}'\mathbf{x})\right)^{-1} \mathbf{E}(\mathbf{z}'y)\]

Estimation by Method of Moments

As before, substitute sample versions of population moments to get
Also recall that this is a two-step process
The first stage, is the regression of $\mathbf{x}$ on $\mathbf{z}$
- The population regression function is
  
  \[\mathbf{x}^{*} = \mathbf{z}\boldsymbol{\Pi}\] \[\boldsymbol{\Pi} = \left(\mathbf{E}(\mathbf{z'z})\right)^{-1}\mathbf{E}(\mathbf{z'x})\]
- The sample version is \[\mathbf{\hat{X}} = \mathbf{Z}\boldsymbol{\hat{\Pi}}\] \[\boldsymbol{\hat{\Pi}} = \left( \mathbf{Z'Z} \right)^{-1}\mathbf{Z'X}\]

Estimation by Method of Moments

When there is one endogenous variable, the matrix $\mathbf{\hat{X}}$ is

\[\mathbf{\hat{X}} = \begin{bmatrix} 1 & x_{11} & x_{12} &\cdots&x_{1,k-1} & \hat{x}_{1k}\\ 1 & x_{21} & x_{22} &\cdots&x_{2,k-1} & \hat{x}_{2k}\\ \vdots & \vdots & \ddots &\vdots &\vdots \\ 1 & x_{n1} & x_{n2} &\cdots&x_{n,k-1} & \hat{x}_{nk} \end{bmatrix} = \begin{bmatrix} \mathbf{X_{1}} & \mathbf{\hat{x}_{k}} \end{bmatrix}\]
- For the first stage we only regress $\mathbf{\hat{x}_{k}}$ on $\mathbf{Z}$
- We do not do first stage regressions in practice for the included instruments
  - Because their predicted values and actual values are the same
The second stage is the regression of $\mathbf{y}$ on $\mathbf{\hat{x}}$
- The population slope we derived is
  
  \[\boldsymbol{\beta} = \left(\mathbf{E}(\mathbf{x^{*'}x^{*'}})\right)^{-1}\mathbf{E}(\mathbf{x}^{*'}y)\]

Estimation by Method of Moments

second stage continued...
- The sample version is
  
  \[\boldsymbol{\hat{\beta}} = \left(\mathbf{\hat{X}'\hat{X}}\right)^{-1}\mathbf{\hat{X}'y}\]
The TSLS estimator is simply an OLS estimator in two stages

Intuition

The starting point for TSLS is an endogenous variable
- A variable that is correlated with $u$
- Usually because of omitted variables bias
OLS will not consistently identify the slope vector $\boldsymbol{\beta}$
We have access to at least one variable from outside the model that is
- Uncorrelated with the error term $u$
- Correlated with the endogenous variable $x_{k}$

Intuition

In the first stage, we regress $x_{k}$ on all exogenous variables
- Separates $x_{k}$ into two pieces
  - The exogenous part: the piece that is correlated with the exogenous variables
  - The endogenous part: the residual from this regression, which is uncorrelated with the exogenous piece
- The first stage purges the endogenous component from $x_{k}$
- Keeps only the exogenous component, $\hat{x}_{k}$
We use the only the exogenous piece $\hat{x}_{k}$ in the second stage regression

Statistical Properties of TSLS

Introduction

Like we did with the OLS estimator, we cover the statistical properties of TSLS
- We will be slightly less detailed
We first need to outline the set of assumptions required for consistency

Assumptions

$\mathbf{E}(\mathbf{z}'u) = 0$
- The vector of exogenous variables is uncorrelated with $u$
- This is sometimes called the exclusion restriction
  - The instruments come from outside the model and are uncorrelated with the error

Assumptions

$\text{rank } \mathbf{E}(\mathbf{z'z}) = L$ and $\text{rank } \mathbf{E}(\mathbf{z'x}) = K$
- First part says that none of the variables in $\mathbf{z}$ are perfectly collinear
- Second part is the rank condition
  - The instruments must be sufficiently correlated with the endogenous variable
  - We will come back to this when we talk about weak instruments
- For this assumption to hold we also need to meet the order condition
  - There are at least as many instruments as endogenous variables
  - Mathematically, we need $L\ge K+1$
$\{(\mathbf{x}_{i},\mathbf{z_{i}}, y_{i}: i=1,2,...n)\}$ are a random sample

Consistency

If all of the assumptions are met, the TSLS estimator is consistent for $\boldsymbol{\beta}$
We will not do the proof, but it is very similar to OLS
If any of the assumptions fail, the TSLS estimator is inconsistent

Unbiasedness

In small samples, the TSLS estimator is generally biased
We will not cover the proof
You should only use TSLS with large samples

Large Sample Distribution of $\boldsymbol{\hat{\beta}}$

We again appeal to the Central Limit Theorem
With large $n$, the TSLS estimator has a Normal distribution with mean $\boldsymbol{\beta}$ and variance

\[\text{var}(\boldsymbol{\hat{\beta}}) = n^{-1}[\mathbf{E}(\mathbf{x^{*'}x^{*}})^{-1}]\mathbf{E}(u^2\mathbf{x^{*'}x^{*}})[\mathbf{E}(\mathbf{x^{*'}x^{*}})^{-1}]\]
If we assume homoskedasticity $\mathbf{E}(u^2|\mathbf{z}) = \sigma^2$ this reduces to

\[\text{var}(\boldsymbol{\hat{\beta}}) = \sigma^2 n^{-1}\mathbf{E}(\mathbf{x^{*'}x^{*}})^{-1}\]

Variance Estimator for $\boldsymbol{\hat{\beta}}$

To estimate the variance, substitute sample versions of the population moments
The Heteroskedasticity-Robust variance estimator is

\[\hat{\text{var}}(\boldsymbol{\hat{\beta}}) = \left (\mathbf{\hat{X}'\hat{X}}\right )^{-1}\left ( \sum_{i=1}^{n}\hat{u}_{i}^2\mathbf{\hat{x}_{i}'\hat{x}_{i}}\right )\left ( \mathbf{\hat{X}'\hat{X}}\right )^{-1}\]
If we assume homoskedasticity, it is

\[\hat{\text{var}}(\boldsymbol{\hat{\beta}}) = s_{\hat{u}}^2 \left (\mathbf{\hat{X}'\hat{X}}\right )^{-1}\]
In both cases, we use the TSLS residuals, which are

\[\mathbf{\hat{u}} = \mathbf{y} - \mathbf{X}\boldsymbol{\hat{\beta}}\]

Variance Estimator for $\boldsymbol{\hat{\beta}}$

These are not the residuals from the second stage regression $\mathbf{y} - \mathbf{\hat{X}}\boldsymbol{\hat{\beta}}$
- Use $\mathbf{X}$ – not $\mathbf{\hat{X}}$ – in this calculation
- This is a common mistake
- Using the wrong residuals will lead to incorrect estimates of the standard errors

Structural and Reduced Form in TSLS

Structural vs Reduced form Equations

In this section we present details that are useful for using TSLS in practice
This will help you understand when you read papers using TSLS
Suppose we have a model with one endogenous variable and one instrument
The instrumental variables model in scalar notation is

\[y = \beta_{0} + x_{1}\beta_{1} +...+ x_{k-1}\beta_{k-1} + x_{k}\beta_{k} + u\] \[x_{k} = \pi_{0} + x_{1}\pi_{1}+...+x_{k-1}\pi_{k-1} +z_{1}\delta_{1} + r\]
The first equation is the structural equation
- The equation containing the causal effects we are interested in

Structural vs Reduced form Equations

The second equation is the first stage
- It is a reduced form equation
  - All of the regressors are exogenous
  - The parameters do not necessarily represent causal effects
Researchers often also estimate the reduced form for $y$
- Sub the second equation into the first to get \[y = (\beta_{0} + \beta_{k}\pi_{0}) + x_{1}(\beta_{1} + \beta_{k}\pi_{1}) +...+ x_{k-1} (\beta_{k-1}+ \beta_{k}\pi_{k-1}) + z_{1}\beta_{k}\delta_{1} + u + \beta_{k}r\] \[= \gamma_{0}+ x_{1}\gamma_{1}+...+ x_{k-1} \gamma_{k-1}+ z_{1}\theta_{1}+ \epsilon\]
- This is the regression of $y$ on all the exogenous variables

Structural vs Reduced form Equations

In the reduced form for $y$, the slope $\theta_{1} = \beta_{k}\delta_{1}$
- The effect of $z_{1}$ on $x_{k}$ times the effect of $x_{k}$ on $y$
You can get the slope $\beta_{k}$ by dividing the reduced form by the first stage

\[\beta_{k} = \frac{\theta_{1}}{\delta_{1}}\]
$\beta_{k}$ is the reduced form effect scaled by the first stage
- Ex: returns to schooling
- Imagine $y$ is income, $x_{k}$ is years of schooling, $z_{1}$ is kms to nearest university
- If $\delta_{1} = .5$, being 1km closer to school leads to .5 more years of schooling
- If $\theta_{1} = 10000$, being 1km closer to school leads to $10000 more income
- Then $\beta_{k} = \frac{10000}{.5} = 20000$ is effect of a year of additional schooling

Checking Instrument Validity

Introduction

There are two main assumptions for instrumental variables
- Exclusion restriction: instruments come from outside the model and are uncorrelated with $u$
- Rank condition: excluded instruments are sufficiently related to the endogenous variable
  - This is sometimes also called the instrument relevance condition
Failure of either of these creates problems
To understand these issues, it is useful to study the following model

\[y = \beta_{0} + \beta_{1}x_{1} + u\] \[x_{1} = \pi_{0} + \pi_{1}z_{1} + r\]

Introduction

In this model, you can show that

\[\text{plim }\hat{\beta}_{1} = \beta_{1} + \frac{cov(z_{1},u)}{cov(z_{1},x_{1})}\] \[= \beta_{1} + \frac{\sigma_{u}}{\sigma_{x_{1}}}\frac{corr(z_{1},u)}{corr(z_{1},x_{1})}\]
We will use this result to inform ourselves about the failure of our assumptions
The key idea is that instruments must be relevant and exogenous

Exclusion Restriction

Consider the plim of the TSLS slope estimator

\[\text{plim }\hat{\beta}_{1} = \beta_{1} + \frac{cov(z_{1},u)}{cov(z_{1},x_{1})}\]
If the exclusion restriction holds, then $cov(z_{1},u) = 0$ and $\text{plim }\hat{\beta}_{1} = \beta_{1}$
If not then $\text{plim }\hat{\beta}_{1} \neq \beta_{1}$ and the TSLS estimator is inconsistent
Can we check to see if this assumption is true?
If we have one excluded instrument for each endogenous variable, we cannot
- This model is exactly identified
- There are no enough degrees of freedom to test the exclusion restriction
- We have to rely only on our assumptions in this case

Exclusion Restriction

In a model with multiple excluded instruments for the endogenous variable, we can
- The model is overidentified
- We use the extra instruments to test whether the others are endogenous
A useful test when errors are homoskedastic is the Sargan Test
- Estimate the model using TSLS
- Compute the TSLS residuals, $\mathbf{\hat{u}}$
- Regress $\mathbf{\hat{u}}$ on all instruments $\mathbf{z}$
  - The model is overidentified
  - Save $R^2_u$ from this regression
- The test statistic is $nR^2_u$
  - Has a $\chi^2_{Q_{1}}$ distribution, where $Q_{1}$ are overidentifying restrictions
- The null hypothesis is exogeneity

Exclusion Restriction

There is a more complicated test with heteroskedastic errors
- You can easily execute this in Stata
- We can discuss the process separately
If you fail this test (reject the null), then you need to find other instruments

Instrument Relevance

For TSLS to work, the instrument must be relevant
- It must be sufficiently correlated with the endogenous variable
An instrument that is not sufficiently relevant is a weak instrument
Weak instruments can cause several problems

Inconsistency when the instrument and error are not exactly uncorrelated
- Consider the plim of the TSLS slope estimator
  
  \[\text{plim }\hat{\beta}_{1} = \beta_{1} + \frac{\sigma_{u}}{\sigma_{x_{1}}}\frac{corr(z_{1},u)}{corr(z_{1},x_{1})}\]

Instrument Relevance

Continued...
- When $corr(z_{1},x_{1}) \rightarrow 0$ then $\frac{corr(z_{1},u)}{corr(z_{1},x_{1})} \rightarrow \infty$
  - True even when $corr(z_{1},u)$ is very close to zero
- In practice, we can never be sure $corr(z_{1},u)$ is exactly zero, so this is an issue
- Also note that the plim of the OLS estimator is
  
  \[\text{plim }\hat{\beta}_{1}^{ols} = \beta_{1} + \frac{\sigma_{u}}{\sigma_{x_{1}}}corr(x_{1},u)\]
- Taking the ration of the two biases \[\frac{\hat{\beta}_{1}-\beta_{1}}{\hat{\beta}_{1}^{ols}-\beta_{1}} = \frac{corr(z_{1},u)/corr(x_{1},u)}{corr(z_{1},x_{1})}\]
- As $corr(z_{1},x_{1}) \rightarrow 0$, the inconsistency in 2SLS becomes larger than the inconsistency in OLS

Instrument Relevance

Finite sample bias in TSLS becomes large
- The expected value of the TSLS estimator can be written as
  
  \[\mathbf{E}(\hat{\beta}_{1} - \beta_{1}) \approx \frac{\sigma_{ur}}{\sigma_{r}^2}\frac{1}{F+1}\]
- Where $F$ is the F-statistic from the test of joint significance of the excluded regressors
- Weak instruments means that $F$ is close to zero
- As $F$ gets small, the bias tends to $\frac{\sigma_{ur}}{\sigma_{r}^2}$
  - If $F$ is exactly zero, that means the instrument is irrelevant $(\pi_{1}=0)$
  - In this case $\frac{\sigma_{ur}}{\sigma_{r}^2}$ is also the OLS bias
  - So 2SLS and OLS produce the same biased result
- When instruments are weak, TSLS is biased towards OLS

Instrument Relevance

Standard inference is no longer valid
- We rely in the central limit theorem to show TSLS estimator has a Normal distribution
- With weak instruments, this no longer holds
  - The estimator has the distribution of the ratio of two Normal variables
- In this case the t-statistic no longer has a t-distribution
- Hypothesis tests tend to reject the null more than it should

Instrument Relevance

There are several tests available to check for weak instruments
The null hypotheses of these tests are weak instruments

Test with one endogenous regressor
- Based on F-statistic on excluded instruments in first stage
- Critical values for set amount of bias or size distortion
- Ex: when $F=10$ maximum TSLS bias is about 10% of OLS bias
- Also when $F=10$, the rejection rate of a 5% test is no more than 20%
- As $F$ gets larger, relative bias shrinks and test size distortion falls
- $F=10$ is often called “Staiger Stock rule of thumb”, based on authors who developed it
- Generally thought that when $F>10$, instruments are not weak

Instrument Relevance

Test with more than one endogenous regressor
- Suppose you have two endogenous variables and two instruments
- You could compute a first stage $F$ for each endogenous variable
- Problem arises when you have one instrument that is strong in both first stages
- In this case you have one strong and one weak instrument
- The F-statistic is modified in this case for doing this test jointly
- It is sometimes called the Cragg-Donald F-Statistic
- With one endogenous variable, the Cragg-Donald F-stat is the first-stage F-Stat
Test that are heteroskedasticity-robust
- Both tests described above assume homoskedastic errors
- Kelibergen-Paap adjusted the test for heteroskedasticity
- Otherwise, works in the same way as described above

Instrument Relevance

Finally, what if we fail the weak instrument tests?

Find better instruments or drop weak ones
- If your model is overidentified, you can drop the weak instruments
- If it is exactly identified, you could try to find better instruments
  - This is not easy since instruments generally come from natural experiments
Use weak instrument robust inference
- As noted, t-tests will reject the null too often with weak instruments
- You can adjust testing to have the correct size
- There are several available
  - Anderson Rubin (AR)
  - Conditional Likelihood Ration (CLR)
  - We will not cover the details of this, but all are available in Stata

Interpretation of Slope in IV Models

Introduction

The Rubin model tells us how to interpret the slope in a regression model
Specific interpretation depends on assumptions
- If we make strong assumption that treatment is randomly assigned, regression slope is the average treatment effect (ATE)
- Weaker assumption of mean independence or conditional mean independence of $y_{0}$ gives us average treatment effect on the treated (ATT)
But in an instrumental variables model, these assumptions are violated
- The error term is correlated with treatment
- So we cannot assume mean independence of the error
In this section, we alter the Rubin model to interpret the slope in an IV model

Local Average Treatment Effects

Start with the observed outcomes from the Rubin model \[y = y_{0} + (y_{1} -y_{0})w\]
Now suppose we have a binary instrument $z$
- This variable measures your assignment to treatment
Define observed treatment as \[w = w_{0} + (w_{1} -w_{0})z\]
- $w_{1}$ is potential treatment when assigned to treatment
- $w_{0}$ is potential treatment when not assigned to treatment
- We observe $w=w_{0}$ when $z=0$ and $w=w_{1}$ when $z=1$
\[y_{i} = y_{i0} + (y_{i1} -y_{i0})w_{i0} + (y_{i1} -y_{i0})(w_{i1} -w_{i0})z_{i}\]
Assume that the instrument is independent of treatment and outcomes \[(y_{i0}, y_{i1}, w_{i0}, w_{i1}) \perp z_{i}\]

Local Average Treatment Effects

We saw that the TSLS estimator is the reduced form divided by the first stage
In this setup with binary $y$, the reduced form is

\[E(y|z=1) - E(y|z=0)\]
and with binary $w$ the first stage is

\[E(w|z=1) - E(w|z=0)\]
So that the IV estimator is

\[\frac{E(y|z=1) - E(y|z=0)}{E(w|z=1) - E(w|z=0)}\]
We now derive the the numerator and denominator

Local Average Treatment Effects

To get the numerator, first substitute $w$ equation into $y$ equation \[y = y_{0} + (y_{1} -y_{0})w_{0} + (y_{1} -y_{0})(w_{1} -w_{0})z\]
Assume the instrument $z$ is independent of potential treatment and outcomes \[(y_{0}, y_{1}, w_{0}, w_{1}) \perp z\]
This allows us to write the following conditional expectations \[E(y|z=1) = E(y_{0}) + E((y_{1} -y_{0})w_{0}) + E((y_{1} -y_{0})(w_{1} -w_{0}))\] \[E(y|z=0) = E(y_{0}) + E((y_{1} -y_{0})w_{0})\]
So, \[E(y|z=1) - E(y|z=0) = E((y_{1} -y_{0})(w_{1} -w_{0}))\]

Local Average Treatment Effects

Note that \[E((y_{1} -y_{0})(w_{1} -w_{0}))\] \[= E(y_{1} -y_{0}|w_{1} -w_{0} = 1)P(w_{1} -w_{0} = 1)\] \[- E(y_{1} -y_{0}|w_{1} -w_{0} = -1)P(w_{1} -w_{0} = -1)\] \[+0\times E(y_{1} -y_{0}|w_{1} -w_{0} = 0)P(w_{1} -w_{0} = 0)\]
The last term drops out since it is zero
Reduced form is the difference in treatment effect between two groups
- $E(y_{1} -y_{0}|w_{1} -w_{0} = 1)$ is the treatment effect for “compliers”
  - Compliers have $w_{1}=1$ and $w_{0} = 0$, so that $w_{1} -w_{0} = 1$
- $E(y_{1} -y_{0}|w_{1} -w_{0} = -1)$ is the treatment effect for “defiers”
  - Defiers have $w_{1}=0$ and $w_{0} = 1$, so that $w_{1} -w_{0} = -1$

Local Average Treatment Effects

Problem: reduced form might be 0 even with positive treatment effects for both groups
- Suppose $E(y_{1} -y_{0}|w_{1} -w_{0} = 1) = E(y_{1} -y_{0}|w_{1} -w_{0} = -1)$
- Then $E(y|z=1) - E(y|z=0) = 0$ and reduced form effect is zero
- Makes it difficult to measure treatment effects
To correct for this, we must assume monotonicity \[w_{1} \ge w_{0}\]
- Says effect of instrument goes one way
- If assigned to treatment you take it, if not you don’t take it
- Precludes defiers who don’t take treatment when assigned, take it when not assigned

Local Average Treatment Effects

Monotonicity implies $P(w_{i1} -w_{i0} = -1) = 0$, so \[E(y|z=1) - E(y|z=0)\] \[= E(y_{1} -y_{0}|w_{1} -w_{0} = 1)P(w_{1} -w_{0} = 1)\]
With monotonicity $(w_{1} -w_{0})$ can equal 0 or 1, so

\[E(w_{1} - w_{0}) = 1\times P(w_{1} -w_{0} = 1) + 0\times P(w_{1} -w_{0} = 0)\] \[=P(w_{1} -w_{0} = 1)\]
Using this in the reduced form gives us

\[E(y|z=1) - E(y|z=0)\] \[= E(y_{1} -y_{0}|w_{1} -w_{0} = 1)E(w_{1} -w_{0})\]

Local Average Treatment Effects

To derive the denominator, remember potential treatment are independent of $z$ \[E(w |z=1) -E(w|z = 0) = E(w_{1} -w_{0})\]
Combining the numerator and denominator \[\frac{E(y|z=1) - E(y|z=0)}{E(w|z=1) - E(w|z=0)} = \frac{E(y_{1} -y_{0}|w_{1} -w_{0} = 1)E(w_{1} -w_{0}) }{E(w_{1} -w_{0}) }\]

\[= E(y_{1} -y_{0}|w_{1} -w_{0} = 1)\]
Says the IV estimator equals $E(y_{1} -y_{0}|w_{1} -w_{0} = 1)$
- This is known as the Local Average Treatment Effect (LATE)
- The average treatment effect among compliers
- Can interpret as treatment effect among people influenced by instrument

Examples of LATE

Smith (2009)
- Effect of school entry age on test scores
- Uses “assigned entry age” as instrument for actual entry age
- LATE: measures effect of entry age on test scores for people who follow rules
Angrist (1990)
- Effect of military service on earnings
- Uses draft eligibility as determined by lottery as instrument for military service
- LATE: measures effect of military service for those who complied with the draft lottery
Key: often the “complier” subpopulation is different from the average population
- So LATE $\neq$ ATE in general

LATE in Regression

An instrumental variables model without covariates is \[y = \beta_{0} + \beta_{1}w + u\] \[w = \pi_{0} + \pi_{1}z + r\]
The reduced form for $y$ is therefore

\[y = \gamma_{0} + \gamma_{1}z + \epsilon\]
Taking expectations, \[E(y|z = 1) = \gamma_{0} + E(\gamma_{1}|z = 1) + E[\epsilon|z=1]\] \[E(y|z = 0) = \gamma_{0} + E[\epsilon|z=0]\]

LATE in Regression

Under the LATE assumptions, \[E(y|z = 1) - E(y|z = 0) = E(\gamma_{1})\]
From the reduced form for $w$

\[E(w|z = 1) = \pi_{0} + E(\pi_{1}|z = 1) + E[r|z=1]\] \[E(w|z = 0) = \pi_{0} + E[r|z=0]\]
Taking the difference \[E(w|z = 1) - E(w|z = 0) = E(\pi_{1})\]

LATE in Regression

Taking the ratio \[\frac{E(y|z = 1) - E(y|z = 0)}{E(w|z = 1) - E(w|z = 0)} = \frac{E(\gamma_{1})}{E(\pi_{1})}\]
Relating this back to late, note that $\gamma = \beta_{1}\pi_{1}$

\[\frac{E(\gamma_{1})}{E(\pi_{1})} = \frac{E(\beta_{1}\pi_{1})}{E(\pi_{1})}\]
Using the monotonicity assumption

\[\frac{E(\beta_{1}\pi_{1})}{E(\pi_{1})} = \frac{E(\beta_{1}|\pi_{1}>0)P(\pi_{1}>0)}{E(\pi_{1})} = E(\beta_{1}|\pi_{1}>0)\]
Because $P(\pi_{1}>0)=E(\pi_{1})$

Instrumental Variables

Introduction

Introduction

Introduction

Instrumental Variables

Model

Model

Model

Model

Estimation by Method of Moments

Two-Stage Least Squares

Model

Model

Model

Model

Model

Model

Model

Estimation by Method of Moments

Estimation by Method of Moments

Estimation by Method of Moments

Intuition

Intuition

Statistical Properties of TSLS

Introduction

Assumptions

Assumptions

Consistency

Unbiasedness

Large Sample Distribution of \(\boldsymbol{\hat{\beta}}\)

Variance Estimator for \(\boldsymbol{\hat{\beta}}\)

Variance Estimator for \(\boldsymbol{\hat{\beta}}\)

Structural and Reduced Form in TSLS

Structural vs Reduced form Equations

Structural vs Reduced form Equations

Structural vs Reduced form Equations

Checking Instrument Validity

Introduction

Introduction

Exclusion Restriction

Exclusion Restriction

Exclusion Restriction

Instrument Relevance

Instrument Relevance

Instrument Relevance

Instrument Relevance

Instrument Relevance

Instrument Relevance

Instrument Relevance

Interpretation of Slope in IV Models

Introduction

Local Average Treatment Effects

Local Average Treatment Effects

Local Average Treatment Effects

Local Average Treatment Effects

Local Average Treatment Effects

Local Average Treatment Effects

Local Average Treatment Effects

Examples of LATE

LATE in Regression

LATE in Regression

LATE in Regression