EC655
Justin Smith
Wilfrid Laurier University
Fall 2022
In many empirical applications we want to estimate a causal effect
The slope in a structural model
The independent effect of a particular variable on the outcome
OLS is inconsistent for this parameter when the error is correlated with \(\mathbf{x}\)
This happens often with omitted variables
There are several other situations
Simultaneous equations
Measurement error in \(\mathbf{x}\)
If we want to consistently estimate causal parameter, we need a different method
First, some important terminology
Variables in a regression model that are correlated with the error are endogenous
Those that are uncorrelated with the error are exogenous
We will learn several methods to uncover a causal effect with endogenous variables
Instrumental Variables
Difference in Differences
Regression Discontinuity
Each takes a different approach to the problem
One commonality is that many are usually based on natural experiments
Individuals are exposed to experimental conditional by outside factors
Some examples
Vietnam draft lottery and effect of military service on wages
2008 Beijing air pollution reduction and birthweight
Maimonides’ rule and effect of class size on achievement
Start with the population regression function
\[y = \mathbf{x}\boldsymbol{\beta} + u\]
\[= \mathbf{x_{1}}\boldsymbol{\beta_{1}} + x_{k}\beta_{k} + u\]
Suppose that all \(\mathbf{x_{1}}\) are uncorrelated with \(u\)
But we are worried that \(x_{k}\) is correlated with \(u\)
We saw that OLS leads to inconsistent estimates of \(\boldsymbol{\beta}\)
Now imagine we have variable \(z_{1}\) that is
Uncorrelated with \(u\)
Correlated with \(x_{k}\)
Define the vector \(\mathbf{z}\) to be
\[\mathbf{z} = [1,x_{1}, x_{2}, ...,x_{k-1},z_{1}]\]
It is the \(\mathbf{x}\) vector with \(z_{1}\) in place of \(x_{k}\)
This is the vector of all exogenous factors
\(\mathbf{x_{1}}\) are included instruments
\(z_{1}\) is the excluded instrument because it comes from outside the model
Since we assume all factors in \(\mathbf{z}\) are exogenous, we can write
\[\mathbf{E}(\mathbf{z}'u) = 0\]
If we take our model
\[y = \mathbf{x}\boldsymbol{\beta} + u\]
Pre-multiply by \(\mathbf{z}\) and take expectations
\[\mathbf{E}(\mathbf{z}'y) = \mathbf{E}(\mathbf{z'x}) \boldsymbol{\beta} + \mathbf{E}(\mathbf{z}'u)\]
Using the fact that \(\mathbf{E}(\mathbf{z}'u) = 0\)
\[\boldsymbol{\beta} = [\mathbf{E}(\mathbf{z'x})] ^{-1}\mathbf{E}(\mathbf{z}'y)\]
To be able to identify \(\boldsymbol{\beta}\) as we did above, we need to assume
\(\text{rank }\mathbf{z'z} = K+1\)
The variables in \(\mathbf{z}\) are linearly independent
If this is not true we have perfect multicollinearity, and cannot compute \(\boldsymbol{\beta}\)
\(\text{rank }\mathbf{z'x} = K+1\)
The instrument is correlated with the endogenous variable
This allows us to invert \(\mathbf{E}(\mathbf{z'x})\)
Combined with the orthogonality condition this means an instrument must be
Uncorrelated with the error term
Correlated with the endogenous variable
If either assumption fails, we cannot use IV
Applying the method of moments to the population slope
\[\boldsymbol{\hat{\beta}}=\left ( \frac{1}{n}\sum_{i=1}^{n}\mathbf{z_{i}'x_{i}} \right )^{-1} \left ( \frac{1}{n} \sum_{i=1}^{n}\mathbf{z}_{i}'y_{i}\right ) = \left ( \sum_{i=1}^{n}\mathbf{z_{i}'x_{i}} \right )^{-1} \left ( \sum_{i=1}^{n}\mathbf{z}_{i}'y_{i}\right )\]
\[=\left ( \mathbf{Z'X} \right )^{-1} \left ( \mathbf{Z'y}\right )\]
The IV estimator is very similar to the OLS estimator, except
In some cases we have more than one instrument for an endogenous variable
Two-Stage Least Squares (TSLS) generalizes the IV model for this situation
The TSLS procedure is
First Stage: regress endogenous variable on all exogenous variables
Second Stage: regress dependent variable on exogenous \(\mathbf{x}\) variables and predicted value of endogenous variable from first stage
This process keeps only exogenous part of endogenous variable in the second stage
To see this mathematically, imagine as before that
\[y = \mathbf{x_{1}}\boldsymbol{\beta_{1}} + x_{k}\beta_{k} + u\]
\(x_{k}\) is endogenous, and \(\mathbf{x_{1}}\) are exogenous
This is a structural equation for \(y\), because it measures causal effects
The vector \(\mathbf{z}\) is now
\[\mathbf{z} = [1,x_{1}, x_{2}, ...,x_{k-1},z_{1}, z_{2},...,z_{m}]\]
Contains \(K\) exogenous \(\mathbf{x}\) variables and \(M\) instruments
Total dimension of this vector is \(L=K+M\)
There are at least as many instruments as endogenous variables
When \(\mathbf{z}\) is exogenous, \(\mathbf{E}(\mathbf{z}'u) = 0\)
If \(\mathbf{z}\) is uncorrelated with \(u\), then so is any linear combination of \(\mathbf{z}\)
The reduced form linear relationship between \(\mathbf{x}\) and \(\mathbf{z}\) is
\[\mathbf{x} = \mathbf{z}\boldsymbol{\Pi} + r\]
This is a regression of each element of \(\mathbf{x}\) on \(\mathbf{z}\)
\(\boldsymbol{\Pi}\) is a \(L\times K+1\) matrix
Each column is the coefficients from regressing an element of \(\mathbf{x}\) on \(\mathbf{z}\)
Like doing a separate regression of each element in \(\mathbf{x}\) on \(\mathbf{z}\) and collecting all coefficient vectors into a big matrix
Reduced form means all the regressors are exogenous, but it does not necessarily measure causal effects
The population regression function in this relationship is
\[\mathbf{x}^{*} = \mathbf{z}\boldsymbol{\Pi}\]
When you regress a variable on itself, the predicted value is itself
So the vector \(\mathbf{x}^{*}\) is
\[\mathbf{x}^{*} = [1,x_{1}, x_{2}, ...,x_{k-1},x_{k}^{*}]\]
where \(x_{k}^{*} = \pi_{0} + \pi_{1}x_{1} + \pi_{2}x_{2} + ...+\pi_{k-1}x_{k-1} + \pi_{k+1}z_{1} + ... +\pi_{k+m}z_{m}\)
For the first stage, we only need to regress the endogenous variables on \(\mathbf{z}\)
As noted, a linear combination of \(\mathbf{z}\) is uncorrelated with \(u\), so
\[\mathbf{E}(\mathbf{x}^{*'}u) = 0\]
Premultiplying our structural equation by \(\mathbf{x}^{*'}\) and taking expectations
\[\mathbf{E}(\mathbf{x}^{*'}y) = \mathbf{E}(\mathbf{x^{*'}x}) \boldsymbol{\beta} + \mathbf{E}(\mathbf{x}^{*'}u)\]
Using \(\mathbf{E}(\mathbf{x}^{*'}u) = 0\)
\[\boldsymbol{\beta} = \left(\mathbf{E}(\mathbf{x^{*'}x})\right)^{-1}\mathbf{E}(\mathbf{x}^{*'}y)\]
This is the population TSLS slope
You can also write this estimator as
\[\boldsymbol{\beta} = \left(\mathbf{E}(\mathbf{x^{*'}x^{*}})\right)^{-1}\mathbf{E}(\mathbf{x}^{*'}y)\]
This is the slope from a regression of \(y\) on \(\mathbf{x^{*}}\)
It is the slope from the second stage
To see how, recall that \[\mathbf{x}= \mathbf{z}\boldsymbol{\Pi} + r= \mathbf{x}^{*} + r\]
Pre-multiply by \(\mathbf{x}^{*'}\) and take expectations \[\mathbf{E}(\mathbf{x}^{*'} \mathbf{x})= \mathbf{E}(\mathbf{x}^{*'} \mathbf{x}^{*}) + \mathbf{E}(\mathbf{x}^{*'} r)\]
Since by definition \(\mathbf{E}(\mathbf{x}^{*'} r) = 0\) \[\mathbf{E}(\mathbf{x}^{*'} \mathbf{x})= \mathbf{E}(\mathbf{x}^{*'} \mathbf{x}^{*})\]
Finally, note that the IV estimator is a special case of 2SLS
Assume we have one endogenous variable and one excluded instrument
This means that \(\mathbf{x}\) and \(\mathbf{z}\) are both \(K+1\) matrices
The matrix \(\mathbf{\Pi}\) is \(K+1 \times K+1\)
Substitute the definition of \(\mathbf{x}^{*'}\) into the formula for \(\boldsymbol{\beta}\) \[\boldsymbol{\beta} = \left(\mathbf{E}(\mathbf{(z\Pi)'x})\right)^{-1}\mathbf{E}((\mathbf{z\Pi)'}y)\]
Distributing the transpose and pulling \(\mathbf{\Pi}\) out of the expectation \[\boldsymbol{\beta} = \left(\mathbf{\Pi'}\mathbf{E}\mathbf{(z'x})\right)^{-1}\mathbf{\Pi'}\mathbf{E}(\mathbf{z'}y) = \left(\mathbf{E}\mathbf{(z'x})\right)^{-1}(\mathbf{\Pi'})^{-1}\mathbf{\Pi'}\mathbf{E}(\mathbf{z'}y)\] \[\boldsymbol{\beta} =\left(\mathbf{E}(\mathbf{z}'\mathbf{x})\right)^{-1} \mathbf{E}(\mathbf{z}'y)\]
As before, substitute sample versions of population moments to get
Also recall that this is a two-step process
The first stage, is the regression of \(\mathbf{x}\) on \(\mathbf{z}\)
The population regression function is
\[\mathbf{x}^{*} = \mathbf{z}\boldsymbol{\Pi}\] \[\boldsymbol{\Pi} = \left(\mathbf{E}(\mathbf{z'z})\right)^{-1}\mathbf{E}(\mathbf{z'x})\]
The sample version is \[\mathbf{\hat{X}} = \mathbf{Z}\boldsymbol{\hat{\Pi}}\] \[\boldsymbol{\hat{\Pi}} = \left( \mathbf{Z'Z} \right)^{-1}\mathbf{Z'X}\]
When there is one endogenous variable, the matrix \(\mathbf{\hat{X}}\) is
\[\mathbf{\hat{X}} = \begin{bmatrix} 1 & x_{11} & x_{12} &\cdots&x_{1,k-1} & \hat{x}_{1k}\\ 1 & x_{21} & x_{22} &\cdots&x_{2,k-1} & \hat{x}_{2k}\\ \vdots & \vdots & \ddots &\vdots &\vdots \\ 1 & x_{n1} & x_{n2} &\cdots&x_{n,k-1} & \hat{x}_{nk} \end{bmatrix} = \begin{bmatrix} \mathbf{X_{1}} & \mathbf{\hat{x}_{k}} \end{bmatrix}\]
For the first stage we only regress \(\mathbf{\hat{x}_{k}}\) on \(\mathbf{Z}\)
We do not do first stage regressions in practice for the included instruments
The second stage is the regression of \(\mathbf{y}\) on \(\mathbf{\hat{x}}\)
The population slope we derived is
\[\boldsymbol{\beta} = \left(\mathbf{E}(\mathbf{x^{*'}x^{*'}})\right)^{-1}\mathbf{E}(\mathbf{x}^{*'}y)\]
second stage continued...
The sample version is
\[\boldsymbol{\hat{\beta}} = \left(\mathbf{\hat{X}'\hat{X}}\right)^{-1}\mathbf{\hat{X}'y}\]
The TSLS estimator is simply an OLS estimator in two stages
The starting point for TSLS is an endogenous variable
A variable that is correlated with \(u\)
Usually because of omitted variables bias
OLS will not consistently identify the slope vector \(\boldsymbol{\beta}\)
We have access to at least one variable from outside the model that is
Uncorrelated with the error term \(u\)
Correlated with the endogenous variable \(x_{k}\)
In the first stage, we regress \(x_{k}\) on all exogenous variables
Separates \(x_{k}\) into two pieces
The exogenous part: the piece that is correlated with the exogenous variables
The endogenous part: the residual from this regression, which is uncorrelated with the exogenous piece
The first stage purges the endogenous component from \(x_{k}\)
Keeps only the exogenous component, \(\hat{x}_{k}\)
We use the only the exogenous piece \(\hat{x}_{k}\) in the second stage regression
Like we did with the OLS estimator, we cover the statistical properties of TSLS
We first need to outline the set of assumptions required for consistency
\(\mathbf{E}(\mathbf{z}'u) = 0\)
The vector of exogenous variables is uncorrelated with \(u\)
This is sometimes called the exclusion restriction
\(\text{rank } \mathbf{E}(\mathbf{z'z}) = L\) and \(\text{rank } \mathbf{E}(\mathbf{z'x}) = K\)
First part says that none of the variables in \(\mathbf{z}\) are perfectly collinear
Second part is the rank condition
The instruments must be sufficiently correlated with the endogenous variable
We will come back to this when we talk about weak instruments
For this assumption to hold we also need to meet the order condition
There are at least as many instruments as endogenous variables
Mathematically, we need \(L\ge K+1\)
\(\{(\mathbf{x}_{i},\mathbf{z_{i}}, y_{i}: i=1,2,...n)\}\) are a random sample
If all of the assumptions are met, the TSLS estimator is consistent for \(\boldsymbol{\beta}\)
We will not do the proof, but it is very similar to OLS
If any of the assumptions fail, the TSLS estimator is inconsistent
In small samples, the TSLS estimator is generally biased
We will not cover the proof
You should only use TSLS with large samples
We again appeal to the Central Limit Theorem
With large \(n\), the TSLS estimator has a Normal distribution with mean \(\boldsymbol{\beta}\) and variance
\[\text{var}(\boldsymbol{\hat{\beta}}) = n^{-1}[\mathbf{E}(\mathbf{x^{*'}x^{*}})^{-1}]\mathbf{E}(u^2\mathbf{x^{*'}x^{*}})[\mathbf{E}(\mathbf{x^{*'}x^{*}})^{-1}]\]
If we assume homoskedasticity \(\mathbf{E}(u^2|\mathbf{z}) = \sigma^2\) this reduces to
\[\text{var}(\boldsymbol{\hat{\beta}}) = \sigma^2 n^{-1}\mathbf{E}(\mathbf{x^{*'}x^{*}})^{-1}\]
To estimate the variance, substitute sample versions of the population moments
The Heteroskedasticity-Robust variance estimator is
\[\hat{\text{var}}(\boldsymbol{\hat{\beta}}) = \left (\mathbf{\hat{X}'\hat{X}}\right )^{-1}\left ( \sum_{i=1}^{n}\hat{u}_{i}^2\mathbf{\hat{x}_{i}'\hat{x}_{i}}\right )\left ( \mathbf{\hat{X}'\hat{X}}\right )^{-1}\]
If we assume homoskedasticity, it is
\[\hat{\text{var}}(\boldsymbol{\hat{\beta}}) = s_{\hat{u}}^2 \left (\mathbf{\hat{X}'\hat{X}}\right )^{-1}\]
In both cases, we use the TSLS residuals, which are
\[\mathbf{\hat{u}} = \mathbf{y} - \mathbf{X}\boldsymbol{\hat{\beta}}\]
These are not the residuals from the second stage regression \(\mathbf{y} - \mathbf{\hat{X}}\boldsymbol{\hat{\beta}}\)
Use \(\mathbf{X}\) – not \(\mathbf{\hat{X}}\) – in this calculation
This is a common mistake
Using the wrong residuals will lead to incorrect estimates of the standard errors
In this section we present details that are useful for using TSLS in practice
This will help you understand when you read papers using TSLS
Suppose we have a model with one endogenous variable and one instrument
The instrumental variables model in scalar notation is
\[y = \beta_{0} + x_{1}\beta_{1} +...+ x_{k-1}\beta_{k-1} + x_{k}\beta_{k} + u\] \[x_{k} = \pi_{0} + x_{1}\pi_{1}+...+x_{k-1}\pi_{k-1} +z_{1}\delta_{1} + r\]
The first equation is the structural equation
The second equation is the first stage
It is a reduced form equation
All of the regressors are exogenous
The parameters do not necessarily represent causal effects
Researchers often also estimate the reduced form for \(y\)
Sub the second equation into the first to get \[y = (\beta_{0} + \beta_{k}\pi_{0}) + x_{1}(\beta_{1} + \beta_{k}\pi_{1}) +...+ x_{k-1} (\beta_{k-1}+ \beta_{k}\pi_{k-1}) + z_{1}\beta_{k}\delta_{1} + u + \beta_{k}r\] \[= \gamma_{0}+ x_{1}\gamma_{1}+...+ x_{k-1} \gamma_{k-1}+ z_{1}\theta_{1}+ \epsilon\]
This is the regression of \(y\) on all the exogenous variables
In the reduced form for \(y\), the slope \(\theta_{1} = \beta_{k}\delta_{1}\)
You can get the slope \(\beta_{k}\) by dividing the reduced form by the first stage
\[\beta_{k} = \frac{\theta_{1}}{\delta_{1}}\]
\(\beta_{k}\) is the reduced form effect scaled by the first stage
Ex: returns to schooling
Imagine \(y\) is income, \(x_{k}\) is years of schooling, \(z_{1}\) is kms to nearest university
If \(\delta_{1} = .5\), being 1km closer to school leads to .5 more years of schooling
If \(\theta_{1} = 10000\), being 1km closer to school leads to $10000 more income
Then \(\beta_{k} = \frac{10000}{.5} = 20000\) is effect of a year of additional schooling
There are two main assumptions for instrumental variables
Exclusion restriction: instruments come from outside the model and are uncorrelated with \(u\)
Rank condition: excluded instruments are sufficiently related to the endogenous variable
Failure of either of these creates problems
To understand these issues, it is useful to study the following model
\[y = \beta_{0} + \beta_{1}x_{1} + u\] \[x_{1} = \pi_{0} + \pi_{1}z_{1} + r\]
In this model, you can show that
\[\text{plim }\hat{\beta}_{1} = \beta_{1} + \frac{cov(z_{1},u)}{cov(z_{1},x_{1})}\] \[= \beta_{1} + \frac{\sigma_{u}}{\sigma_{x_{1}}}\frac{corr(z_{1},u)}{corr(z_{1},x_{1})}\]
We will use this result to inform ourselves about the failure of our assumptions
The key idea is that instruments must be relevant and exogenous
Consider the plim of the TSLS slope estimator
\[\text{plim }\hat{\beta}_{1} = \beta_{1} + \frac{cov(z_{1},u)}{cov(z_{1},x_{1})}\]
If the exclusion restriction holds, then \(cov(z_{1},u) = 0\) and \(\text{plim }\hat{\beta}_{1} = \beta_{1}\)
If not then \(\text{plim }\hat{\beta}_{1} \neq \beta_{1}\) and the TSLS estimator is inconsistent
Can we check to see if this assumption is true?
If we have one excluded instrument for each endogenous variable, we cannot
This model is exactly identified
There are no enough degrees of freedom to test the exclusion restriction
We have to rely only on our assumptions in this case
In a model with multiple excluded instruments for the endogenous variable, we can
The model is overidentified
We use the extra instruments to test whether the others are endogenous
A useful test when errors are homoskedastic is the Sargan Test
Estimate the model using TSLS
Compute the TSLS residuals, \(\mathbf{\hat{u}}\)
Regress \(\mathbf{\hat{u}}\) on all instruments \(\mathbf{z}\)
The model is overidentified
Save \(R^2_u\) from this regression
The test statistic is \(nR^2_u\)
The null hypothesis is exogeneity
There is a more complicated test with heteroskedastic errors
You can easily execute this in Stata
We can discuss the process separately
If you fail this test (reject the null), then you need to find other instruments
For TSLS to work, the instrument must be relevant
An instrument that is not sufficiently relevant is a weak instrument
Weak instruments can cause several problems
Inconsistency when the instrument and error are not exactly uncorrelated
Consider the plim of the TSLS slope estimator
\[\text{plim }\hat{\beta}_{1} = \beta_{1} + \frac{\sigma_{u}}{\sigma_{x_{1}}}\frac{corr(z_{1},u)}{corr(z_{1},x_{1})}\]
Continued...
When \(corr(z_{1},x_{1}) \rightarrow 0\) then \(\frac{corr(z_{1},u)}{corr(z_{1},x_{1})} \rightarrow \infty\)
In practice, we can never be sure \(corr(z_{1},u)\) is exactly zero, so this is an issue
Also note that the plim of the OLS estimator is
\[\text{plim }\hat{\beta}_{1}^{ols} = \beta_{1} + \frac{\sigma_{u}}{\sigma_{x_{1}}}corr(x_{1},u)\]
Taking the ration of the two biases \[\frac{\hat{\beta}_{1}-\beta_{1}}{\hat{\beta}_{1}^{ols}-\beta_{1}} = \frac{corr(z_{1},u)/corr(x_{1},u)}{corr(z_{1},x_{1})}\]
As \(corr(z_{1},x_{1}) \rightarrow 0\), the inconsistency in 2SLS becomes larger than the inconsistency in OLS
Finite sample bias in TSLS becomes large
The expected value of the TSLS estimator can be written as
\[\mathbf{E}(\hat{\beta}_{1} - \beta_{1}) \approx \frac{\sigma_{ur}}{\sigma_{r}^2}\frac{1}{F+1}\]
Where \(F\) is the F-statistic from the test of joint significance of the excluded regressors
Weak instruments means that \(F\) is close to zero
As \(F\) gets small, the bias tends to \(\frac{\sigma_{ur}}{\sigma_{r}^2}\)
If \(F\) is exactly zero, that means the instrument is irrelevant \((\pi_{1}=0)\)
In this case \(\frac{\sigma_{ur}}{\sigma_{r}^2}\) is also the OLS bias
So 2SLS and OLS produce the same biased result
When instruments are weak, TSLS is biased towards OLS
Standard inference is no longer valid
We rely in the central limit theorem to show TSLS estimator has a Normal distribution
With weak instruments, this no longer holds
In this case the t-statistic no longer has a t-distribution
Hypothesis tests tend to reject the null more than it should
There are several tests available to check for weak instruments
The null hypotheses of these tests are weak instruments
Test with one endogenous regressor
Based on F-statistic on excluded instruments in first stage
Critical values for set amount of bias or size distortion
Ex: when \(F=10\) maximum TSLS bias is about 10% of OLS bias
Also when \(F=10\), the rejection rate of a 5% test is no more than 20%
As \(F\) gets larger, relative bias shrinks and test size distortion falls
\(F=10\) is often called “Staiger Stock rule of thumb”, based on authors who developed it
Generally thought that when \(F>10\), instruments are not weak
Test with more than one endogenous regressor
Suppose you have two endogenous variables and two instruments
You could compute a first stage \(F\) for each endogenous variable
Problem arises when you have one instrument that is strong in both first stages
In this case you have one strong and one weak instrument
The F-statistic is modified in this case for doing this test jointly
It is sometimes called the Cragg-Donald F-Statistic
With one endogenous variable, the Cragg-Donald F-stat is the first-stage F-Stat
Test that are heteroskedasticity-robust
Both tests described above assume homoskedastic errors
Kelibergen-Paap adjusted the test for heteroskedasticity
Otherwise, works in the same way as described above
Find better instruments or drop weak ones
If your model is overidentified, you can drop the weak instruments
If it is exactly identified, you could try to find better instruments
Use weak instrument robust inference
As noted, t-tests will reject the null too often with weak instruments
You can adjust testing to have the correct size
There are several available
Anderson Rubin (AR)
Conditional Likelihood Ration (CLR)
We will not cover the details of this, but all are available in Stata
The Rubin model tells us how to interpret the slope in a regression model
Specific interpretation depends on assumptions
If we make strong assumption that treatment is randomly assigned, regression slope is the average treatment effect (ATE)
Weaker assumption of mean independence or conditional mean independence of \(y_{0}\) gives us average treatment effect on the treated (ATT)
But in an instrumental variables model, these assumptions are violated
The error term is correlated with treatment
So we cannot assume mean independence of the error
In this section, we alter the Rubin model to interpret the slope in an IV model
Start with the observed outcomes from the Rubin model \[y = y_{0} + (y_{1} -y_{0})w\]
Now suppose we have a binary instrument \(z\)
Define observed treatment as \[w = w_{0} + (w_{1} -w_{0})z\]
\(w_{1}\) is potential treatment when assigned to treatment
\(w_{0}\) is potential treatment when not assigned to treatment
We observe \(w=w_{0}\) when \(z=0\) and \(w=w_{1}\) when \(z=1\)
\[y_{i} = y_{i0} + (y_{i1} -y_{i0})w_{i0} + (y_{i1} -y_{i0})(w_{i1} -w_{i0})z_{i}\]
Assume that the instrument is independent of treatment and outcomes \[(y_{i0}, y_{i1}, w_{i0}, w_{i1}) \perp z_{i}\]
We saw that the TSLS estimator is the reduced form divided by the first stage
In this setup with binary \(y\), the reduced form is
\[E(y|z=1) - E(y|z=0)\]
and with binary \(w\) the first stage is
\[E(w|z=1) - E(w|z=0)\]
So that the IV estimator is
\[\frac{E(y|z=1) - E(y|z=0)}{E(w|z=1) - E(w|z=0)}\]
We now derive the the numerator and denominator
To get the numerator, first substitute \(w\) equation into \(y\) equation \[y = y_{0} + (y_{1} -y_{0})w_{0} + (y_{1} -y_{0})(w_{1} -w_{0})z\]
Assume the instrument \(z\) is independent of potential treatment and outcomes \[(y_{0}, y_{1}, w_{0}, w_{1}) \perp z\]
This allows us to write the following conditional expectations \[E(y|z=1) = E(y_{0}) + E((y_{1} -y_{0})w_{0}) + E((y_{1} -y_{0})(w_{1} -w_{0}))\] \[E(y|z=0) = E(y_{0}) + E((y_{1} -y_{0})w_{0})\]
So, \[E(y|z=1) - E(y|z=0) = E((y_{1} -y_{0})(w_{1} -w_{0}))\]
Note that \[E((y_{1} -y_{0})(w_{1} -w_{0}))\] \[= E(y_{1} -y_{0}|w_{1} -w_{0} = 1)P(w_{1} -w_{0} = 1)\] \[- E(y_{1} -y_{0}|w_{1} -w_{0} = -1)P(w_{1} -w_{0} = -1)\] \[+0\times E(y_{1} -y_{0}|w_{1} -w_{0} = 0)P(w_{1} -w_{0} = 0)\]
The last term drops out since it is zero
Reduced form is the difference in treatment effect between two groups
\(E(y_{1} -y_{0}|w_{1} -w_{0} = 1)\) is the treatment effect for “compliers”
\(E(y_{1} -y_{0}|w_{1} -w_{0} = -1)\) is the treatment effect for “defiers”
Problem: reduced form might be 0 even with positive treatment effects for both groups
Suppose \(E(y_{1} -y_{0}|w_{1} -w_{0} = 1) = E(y_{1} -y_{0}|w_{1} -w_{0} = -1)\)
Then \(E(y|z=1) - E(y|z=0) = 0\) and reduced form effect is zero
Makes it difficult to measure treatment effects
To correct for this, we must assume monotonicity \[w_{1} \ge w_{0}\]
Says effect of instrument goes one way
If assigned to treatment you take it, if not you don’t take it
Precludes defiers who don’t take treatment when assigned, take it when not assigned
Monotonicity implies \(P(w_{i1} -w_{i0} = -1) = 0\), so \[E(y|z=1) - E(y|z=0)\] \[= E(y_{1} -y_{0}|w_{1} -w_{0} = 1)P(w_{1} -w_{0} = 1)\]
With monotonicity \((w_{1} -w_{0})\) can equal 0 or 1, so
\[E(w_{1} - w_{0}) = 1\times P(w_{1} -w_{0} = 1) + 0\times P(w_{1} -w_{0} = 0)\] \[=P(w_{1} -w_{0} = 1)\]
Using this in the reduced form gives us
\[E(y|z=1) - E(y|z=0)\] \[= E(y_{1} -y_{0}|w_{1} -w_{0} = 1)E(w_{1} -w_{0})\]
To derive the denominator, remember potential treatment are independent of \(z\) \[E(w |z=1) -E(w|z = 0) = E(w_{1} -w_{0})\]
Combining the numerator and denominator \[\frac{E(y|z=1) - E(y|z=0)}{E(w|z=1) - E(w|z=0)} = \frac{E(y_{1} -y_{0}|w_{1} -w_{0} = 1)E(w_{1} -w_{0}) }{E(w_{1} -w_{0}) }\]
\[= E(y_{1} -y_{0}|w_{1} -w_{0} = 1)\]
Says the IV estimator equals \(E(y_{1} -y_{0}|w_{1} -w_{0} = 1)\)
This is known as the Local Average Treatment Effect (LATE)
The average treatment effect among compliers
Can interpret as treatment effect among people influenced by instrument
Smith (2009)
Effect of school entry age on test scores
Uses “assigned entry age” as instrument for actual entry age
LATE: measures effect of entry age on test scores for people who follow rules
Angrist (1990)
Effect of military service on earnings
Uses draft eligibility as determined by lottery as instrument for military service
LATE: measures effect of military service for those who complied with the draft lottery
Key: often the “complier” subpopulation is different from the average population
An instrumental variables model without covariates is \[y = \beta_{0} + \beta_{1}w + u\] \[w = \pi_{0} + \pi_{1}z + r\]
The reduced form for \(y\) is therefore
\[y = \gamma_{0} + \gamma_{1}z + \epsilon\]
Taking expectations, \[E(y|z = 1) = \gamma_{0} + E(\gamma_{1}|z = 1) + E[\epsilon|z=1]\] \[E(y|z = 0) = \gamma_{0} + E[\epsilon|z=0]\]
Under the LATE assumptions, \[E(y|z = 1) - E(y|z = 0) = E(\gamma_{1})\]
From the reduced form for \(w\)
\[E(w|z = 1) = \pi_{0} + E(\pi_{1}|z = 1) + E[r|z=1]\] \[E(w|z = 0) = \pi_{0} + E[r|z=0]\]
Taking the difference \[E(w|z = 1) - E(w|z = 0) = E(\pi_{1})\]
Taking the ratio \[\frac{E(y|z = 1) - E(y|z = 0)}{E(w|z = 1) - E(w|z = 0)} = \frac{E(\gamma_{1})}{E(\pi_{1})}\]
Relating this back to late, note that \(\gamma = \beta_{1}\pi_{1}\)
\[\frac{E(\gamma_{1})}{E(\pi_{1})} = \frac{E(\beta_{1}\pi_{1})}{E(\pi_{1})}\]
Using the monotonicity assumption
\[\frac{E(\beta_{1}\pi_{1})}{E(\pi_{1})} = \frac{E(\beta_{1}|\pi_{1}>0)P(\pi_{1}>0)}{E(\pi_{1})} = E(\beta_{1}|\pi_{1}>0)\]
Because \(P(\pi_{1}>0)=E(\pi_{1})\)