EC655
Justin Smith
Wilfrid Laurier University
Fall 2022
Repeated observations of some individual unit along some dimension
Typically, observing same people/firms/countries over time
Second dimension does not have to be time
Panel data can be used in several ways
Deal with individual heterogeneity
Increase variation (reduce standard errors)
Study dynamics
In microeconometrics, panel data mostly controls for individual heterogeneity
We will study
Basic panel data methods
Using panel data to identify parameters
Panels have at least 2 dimensions
A balanced panel is one where all individuals are observed in every time period
An unbalanced panel has at least one person not observed in a time period
Balanced Panel
ID | Year | Income |
1 | 1990 | 60000 |
1 | 1991 | 65000 |
1 | 1992 | 90000 |
2 | 1990 | 20000 |
2 | 1991 | 21000 |
2 | 1992 | 24000 |
Unbalanced Panel
ID | Year | Education |
1 | 1990 | 60000 |
1 | 1991 | 90000 |
2 | 1990 | 20000 |
2 | 1991 | 21000 |
2 | 1992 | 24000 |
Like regular regression model, but with unobserved variable that only varies over individuals
The regression model is \[y_{ij} = \mathbf{x_{ij}}\boldsymbol{\beta} +a_{i} + u_{ij}\]
\(\mathbf{x_{ij}}\) contains a constant
\(a_{i}\) is the unobserved effect
Assume we are not interested in the effect of \(a_{i}\) on \(y_{ij}\)
Usually it is not observed, or unmeasurable anyway
This is why it is not written with a parameter
Several models we can use to estimate the parameters \(\boldsymbol{\beta}\)
We learned regression model was structural when error was mean independent
Panel model is structural when we assume Strict Exogeneity
Mathematically, strict exogeneity is written as \[E[u_{ij} | \mathbf{x_{i1}}, \mathbf{x_{i2}}, \ldots, \mathbf{x_{iJ}}, a_{i}] =0\]
It implies that \(u_{ij}\) in each time period is uncorrelated with \(\mathbf{x_{ij}}\) in each time period \[E[\mathbf{x_{is}^{'}}u_{ij}] =0, \forall s,j = 1,2,\ldots, J\]
It also implies that the unobserved effect is uncorrelated with \(u_{ij}\) in every time period \[E[a_{i}u_{is}] =0, \forall s = 1,2,\ldots, J\]
Strict exogeneity assumptions are necessary for consistency of estimators we discuss below
Note that strict exogeneity says nothing about the correlation between \(\mathbf{x_{ij}}\) and \(a_{i}\)
We must make an additional assumption
It is this assumption that determines what model we use
If we assume \(\mathbf{x_{ij}}\) and \(a_{i}\) are correlated, then fixed effects is the appropriate method \[y_{ij} = \mathbf{x_{ij}}\boldsymbol{\beta} +a_{i} + u_{ij}\]
This is the most popular panel data method
Main use of panel data is to account for an unobserved factors correlated with \(\mathbf{x_{ij}}\)
In this case, we assume the unobserved factor is time constant
Fixed effects is a method to remove the influence of \(a_{i}\) to get estimates of \(\boldsymbol{\beta}\)
There are two main ways to estimate a fixed model
Find the average of each variable within the cross-sectional unit \[\bar{y}_{i} = \mathbf{\bar{x}_{i}}\boldsymbol{\beta} +a_{i} + \bar{u}_{i}\]
Then subtract the within unit mean from each observation \[y_{ij} - \bar{y}_{i} = (\mathbf{x_{ij}} - \mathbf{\bar{x}_{i}})\boldsymbol{\beta} + a_{i} - a_{i} + u_{ij} - \bar{u}_{i}\] \[y_{ij} - \bar{y}_{i} = (\mathbf{x_{ij}} - \mathbf{\bar{x}_{i}})\boldsymbol{\beta} + u_{ij} - \bar{u}_{i}\] \[y_{ij}^{*} = \mathbf{x_{ij}}^{*}\boldsymbol{\beta} + u_{ij}^{*}\]
Since \(a_{i}\) does not vary across \(j\), \(a_{i}\) is eliminated when we subtract the means
If we estimate the equation above by OLS, we get the Fixed Effects Estimator \[\boldsymbol{\hat{\beta}_{fe}} = \mathbf{(X^{*'} X^{*})^{-1}X^{*'} Y^{*}}\]
The matrix \(\mathbf{X^{*}}\) contains all observations over time and persons and is \((N\times J)\) by \(K\)
It is easiest to think of them as stacked cross-sectional observations
For each cross-sectional observation we have \[\mathbf{X_{i}} = \begin{bmatrix} x_{i1}^{1} &x_{i1}^{2}&\ldots &x_{i1}^{K}\\ x_{i2}^{1} &x_{i2}^{2}&\ldots &x_{i2}^{K} \\ \vdots &\ddots & \ldots & \vdots \\ x_{iJ}^{1}&x_{iJ}^{2}&\ldots &x_{iJ}^{K} \\ \end{bmatrix} , \mathbf{\bar{X}_{i}} =\begin{bmatrix} \bar{x_{i}}^{1} &\bar{x_{i}}^{2}&\ldots &\bar{x_{i}}^{K}\\ \bar{x_{i}}^{1} &\bar{x_{i}}^{2}&\ldots &\bar{x_{i}}^{K} \\ \vdots &\ddots & \ldots & \vdots \\ \bar{x_{i}}^{1} &\bar{x_{i}}^{2}&\ldots &\bar{x_{i}}^{K} \\ \end{bmatrix}\] \[\mathbf{X^{*}_{i}} = \begin{bmatrix} (x_{i1}^{1} - \bar{x}_{i}^{1}) &(x_{i1}^{2}-\bar{x}_{i}^{2})&\ldots &(x_{i1}^{K}-\bar{x}_{i}^{K})\\ (x_{i2}^{1}-\bar{x}_{i}^{1}) &(x_{i2}^{2}-\bar{x}_{i}^{2})&\ldots &(x_{i2}^{K}-\bar{x}_{i}^{K}) \\ \vdots &\ddots & \ldots & \vdots \\ (x_{iJ}^{1} -\bar{x}_{i}^{1})&(x_{iJ}^{2}-\bar{x}_{i}^{2})&\ldots &(x_{iJ}^{K}-\bar{x}_{i}^{K}) \\ \end{bmatrix} = \mathbf{X_{i}} - \mathbf{\bar{X}_{i}}\]
These matrices are then stacked on top of each other \[\mathbf{X^{*}} = \begin{bmatrix} \mathbf{X^{*}_{1}} \\ \mathbf{X^{*}_{2}} \\ \vdots \\ \mathbf{X^{*}_{N}} \end{bmatrix}\]
The fixed effects estimator can be derived using the time-demeaning matrix
Let \(\mathbf{i}\) be a \(J \times 1\) column of ones
Define the \(J \times J\) matrix \(\mathbf{M_{0}}\) as one that turns the columns of any matrix with \(J\) rows into deviations from means \[\mathbf{M_{0}} = \mathbf{I}_{J} - \frac{1}{J}\mathbf{ii'}\]
Next, define the following \(NJ \times NJ\) matrix \(\mathbf{M_{D}}\) \[\mathbf{M_{D}} = \begin{bmatrix} \mathbf{M_{0}} & 0 & \ldots &0\\ 0 & \mathbf{M_{0}}& \ldots &0 \\ \vdots &\ddots & \ldots & \vdots \\ 0 & 0& \ldots & \mathbf{M_{0}} \\ \end{bmatrix}\]
The Fixed Effects estimator can be written as \[\boldsymbol{\hat{\beta}_{fe}} = \mathbf{(X^{'}M_{D}X)^{-1}X^{'}M_{D} Y}\]
The robust estimator for the variance covariance matrix for \(\boldsymbol{\hat{\beta}_{fe}}\) is \[\hat{var}(\boldsymbol{\hat{\beta}_{fe}})= \mathbf{(X^{*'}_{i} X^{*}_{i})^{-1}} \left ( \sum_{i=1}^{n} \mathbf{X^{*'}_{i} \hat{u}_{i}^{*} \hat{u}_{i}^{*'}X^{*}_{i}}\right ) \mathbf{(X^{*'}_{i} X^{*}_{i})^{-1}}\]
This estimator is robust to both heteroskedasticity and serial correlation
Serial correlation is an issue because the data have a time element
Heteroskedasticity can happen because the data have a cross-sectional element
If error has no heteroskedasticity and no serial correlation, you can simplify this variance estimator
In this model, we include a dummy variable for each cross-sectional unit
The interpretation of \(a_{i}\) changes
It is now considered a parameter and not a variable
In practice it does not matter because it produces the same results as within estimator
The regression is \[y_{ij} = \mathbf{x_{ij}}\boldsymbol{\beta} + \mathbf{D_{i}}\boldsymbol{\alpha} + u_{ij}\]
Where \(D_{i}\) is a vector of dummy variables indicating the cross-sectional unit, and \(\alpha\) is an \(N-1\) vector (we exclude 1 of the dummies to identify the model)
The vector \(\boldsymbol{\hat{\beta}_{DVR}}\) will be identical to \(\boldsymbol{\hat{\beta}_{fe}}\)
For a couple of reasons we usually do not use the DVR approach
If \(N\) is large, it takes forever to estimate
We do not care about \(a_{i}\) normally
The estimator for fixed effects is not consistent as \(N \rightarrow \infty\)
With the DVR approach, you would use the variance estimator we discussed in the OLS section
Fixed effects is frequently used to infer causality
Unobserved variables are “controlled” with the fixed effect
This is only appropriate if all unobserved heterogeneity is constant over time
In this method, we still assume \(\mathbf{x_{ij}}\) and \(a_{i}\) are correlated
The estimating equation is \[y_{ij} = \mathbf{x_{ij}}\boldsymbol{\beta} +a_{i} + u_{ij}\]
Imagine lagging this equation by 1 time period \[y_{ij-1} = \mathbf{x_{ij-1}}\boldsymbol{\beta} +a_{i} + u_{ij-1}\]
Then difference the equations \[y_{ij} - y_{ij-1} = \mathbf{(x_{ij} -x_{ij-1}) }\boldsymbol{\beta} +a_{i}-a_{i} + u_{ij} -u_{ij-1}\] \[\Delta y_{ij}= \mathbf{(\Delta x_{ij} ) }\boldsymbol{\beta} + \Delta u_{ij}\]
Since \(a_{i}\) is constant over time for each cross-sectional unit, it is eliminated when we difference
The amount of data we have left after differencing depends on the number of time periods
If T = 2, then we are left with 1 observation per person
If T = 3, then we are left with 2 observations per person
etc...
The first difference estimator is the OLS estimator applied to the differenced data
The robust variance covariance matrix for \(\boldsymbol{\beta_{fd}}\) is
\[\hat{var}(\boldsymbol{\hat{\beta}_{fd}}) = \mathbf{(\Delta X' \Delta X)^{-1}} \left ( \sum_{i=1}^{n} \mathbf{\Delta X'_{i} \Delta\hat{u}_{i} \Delta\hat{u}'_{i}\Delta X_{i}}\right ) \mathbf{(\Delta X' \Delta X)^{-1}}\]
Again this is robust to both heteroskedasticity and serial correlation
When using this method:
There must be variation in a variable over time for it to be included
To infer a causal relationship, the unobserved heterogeneity must be time constant
In random effects we assume \(\mathbf{x_{ij}}\) and \(a_{i}\) are uncorrelated
Thus, random effects is NOT an identification strategy
It forces \(a_{i}\) into the error term
Putting \(a_{i}\) into the error term implies a specific error structure
The model is still \[y_{ij} = \mathbf{x_{ij}}\boldsymbol{\beta} +a_{i} + u_{ij}\]
Except now we force \(a_{i}\) into the error \[y_{ij} = \mathbf{x_{ij}}\boldsymbol{\beta} +v_{ij}\] \[v_{ij} = a_{i} + u_{ij}\]
Implies a “block correlation” in errors across \(i\)
The random effects model attempts to harness this block correlation
The method imposes certain assumptions on the data (in addition to strict exogeneity)
\(E(u_{ij}^2) = \sigma_{u}\) (homoskedasticity)
\(E(u_{ij}u_{is}) = 0, \forall j \neq s\) (no serial correlation)
Strict exogeniety implies \(E(a_{i}u_{ij}) =0, \forall t\)
Under these assumptions
If we group all observations for person \(i\) into a matrix, then
\[\boldsymbol{\Sigma} = E(\mathbf{v_{i}v_{i}'}) = \begin{bmatrix} \sigma_{a} + \sigma_{\epsilon}& \sigma_{a} & \ldots & \sigma_{a}\\ \sigma_{a} & \sigma_{a} + \sigma_{\epsilon}& \ldots & \sigma_{a} \\ \vdots &\ddots & \ldots & \vdots \\ \sigma_{a} & \sigma_{a}& \ldots & \sigma_{a} + \sigma_{\epsilon} \\ \end{bmatrix}\]
The variance covariance matrix of the errors for the whole data set is \[\boldsymbol{\Omega} = \begin{bmatrix} \boldsymbol{\Sigma}& \mathbf{0} & \ldots & \mathbf{0}\\ \mathbf{0}& \boldsymbol{\Sigma}& \ldots & \mathbf{0} \\ \vdots &\ddots & \ldots & \vdots \\ \mathbf{0}& \mathbf{0}& \ldots & \boldsymbol{\Sigma}\\ \end{bmatrix}\]
This is the “random effects structure”
The random effects estimator is GLS applied to the data using the random effects error structure
GLS “transforms” the data, and runs OLS on the transformed data
The random effects estimator is \[\boldsymbol{\hat{\beta}_{re}} = \mathbf{(X' \hat{\Omega}^{-1}X)^{-1}X'\hat{\Omega}^{-1}Y}\]
\(\boldsymbol{\hat{\Omega}}\) is the estimated variance covariance matrix with \(\sigma_{u}\) and \(\sigma_{a}\) replaced by estimates
Note that for \(\boldsymbol{\hat{\beta}_{re}}\) to be efficient, we must make stronger assumptions than before
\(E(u_{ij}^2|\mathbf{x_{i}}, a_{i}) = \sigma_{u}\)
\(E(u_{ij}u_{is}|\mathbf{x_{i}}, a_{i}) = 0, \forall j \neq s\)
\(E(a_{i}^2|\mathbf{x_{i}}) = \sigma_{a}\)
We need a consistent estimates of \(\sigma_{\epsilon}\) and \(\sigma_{a}\)
To do so, we follow the method of Wooldridge (2002)
Recall that \[y_{ij} = \mathbf{x_{ij}}\boldsymbol{\beta} +v_{ij}\] \[v_{ij} = a_{i} + u_{ij}\]
Because of the second equation \[\sigma_{v}^2 = \sigma_{a}^2 + \sigma_{u}^2\]
We will find \(\sigma_{v}^2\) and \(\sigma_{a}^2\), then deduce \(\sigma_{u}^2\)
Because of our previous assumptions, \[\sigma_{v}^2 = \frac{1}{J}\sum_{j=1}^{J}E(v_{ij}^2)\]
This is true for each individual
Replace \(E(v_{ij}^2)\) with a sample average across \(i\) using consistent estimates of \(v_{ij}\)
Because we have assumed \(\mathbf{x_{ij}}\) and \(a_{i}\) are uncorrelated, we can obtain consistent estimates of \(v_{ij}\) from pooled OLS
Regress \(y_{ij} = \mathbf{x_{ij}}\boldsymbol{\beta} +v_{ij}\)
Keep the residuals from this regression, \(\hat{v}_{ij}\)
Then for the estimate of \(\sigma_{v}^2\) \[\hat{\sigma}_{v}^2 = \frac{1}{NJ-K}\sum_{i=1}^{N}\sum_{j=1}^{J}\hat{v}_{ij}^2\]
Now, we obtain an estimate of \(\sigma_{a}\) using a similar method \[\sigma_{a}^2 = \frac{1}{J(J-1)/2}\sum_{j=1}^{J-1}\sum_{s = j+1}^{J}E(v_{ij}v_{is})\]
This is true for each individual
Replace \(E(v_{ij}v_{is})\) with a sample average across \(i\) using consistent estimates of \(v_{ij}\)
The estimate of \(\sigma_{a}^2\) is \[\hat{\sigma}_{a}^2 = \frac{1}{NJ(J-1)/2 - K}\sum_{j=1}^{J-1}\sum_{s = j+1}^{J}\sum_{i=1}^{N}\hat{v}_{ij}\hat{v}_{is}\]
The idea is that there are \(J(J-1)/2\) cross-products of errors for each individual
Averaging these errors together for each person, then averaging across all people, we get a consistent estimate
Once we have \(\hat{\sigma}_{a}^2\) and \(\hat{\sigma}_{v}^2\), we obtain \(\hat{\sigma}_{u}^2 = \hat{\sigma}_{v}^2-\hat{\sigma}_{a}^2\)
With all of these assumptions, the variance estimate of the random effects estimator is
\[\hat{var}( \boldsymbol{\hat{\beta}_{re}}) = \mathbf{(X'\boldsymbol{\hat{\Omega}}^{-1}X)^{-1} }\]
This procedure depends on the assumptions we have made about the random effects structure
We could also avoid those assumptions and use a robust variance estimator \[\hat{var}(\boldsymbol{\hat{\beta}_{re}})= \mathbf{(X'\boldsymbol{\hat{\Omega}}^{-1}X)^{-1} }\left ( \sum_{i=1}^{n} \mathbf{X^{'}_{i} \hat{\Sigma}^{-1} \hat{u}_{i} \hat{u}_{i}^{'}\hat{\Sigma}^{-1}X_{i}}\right ) \mathbf{(X'\boldsymbol{\hat{\Omega}}^{-1}X)^{-1} }\]
But Random Effects is generally all about the structure of the errors