Panel Data

EC655

Justin Smith

Wilfrid Laurier University

Fall 2022

Introduction

Repeated observations of some individual unit along some dimension
- Typically, observing same people/firms/countries over time
- Second dimension does not have to be time
Panel data can be used in several ways
1. Deal with individual heterogeneity
2. Increase variation (reduce standard errors)
3. Study dynamics
In microeconometrics, panel data mostly controls for individual heterogeneity
We will study
- Basic panel data methods
- Using panel data to identify parameters

Panel Data Basics

Structure of Panel Data

Panels have at least 2 dimensions
- Variation occurs over \(i=1,\ldots,N\) people, and \(j=1,\ldots,J\) time
A balanced panel is one where all individuals are observed in every time period
An unbalanced panel has at least one person not observed in a time period

Balanced Panel

ID	Year	Income
1	1990	60000
1	1991	65000
1	1992	90000
2	1990	20000
2	1991	21000
2	1992	24000

Unbalanced Panel

ID	Year	Education
1	1990	60000
1	1991	90000
2	1990	20000
2	1991	21000
2	1992	24000

Unobserved Effects Model

Like regular regression model, but with unobserved variable that only varies over individuals
The regression model is \[y_{ij} = \mathbf{x_{ij}}\boldsymbol{\beta} +a_{i} + u_{ij}\]
- \(\mathbf{x_{ij}}\) contains a constant
- \(a_{i}\) is the unobserved effect
Assume we are not interested in the effect of \(a_{i}\) on \(y_{ij}\)
- Usually it is not observed, or unmeasurable anyway
- This is why it is not written with a parameter
Several models we can use to estimate the parameters \(\boldsymbol{\beta}\)
- Depends on assumption about relationship between \(\mathbf{x_{ij}}\) and \(a_{i}\)

Strict Exogeneity

We learned regression model was structural when error was mean independent
Panel model is structural when we assume Strict Exogeneity
- Error term \(u_{ij}\) has zero mean conditional on \(a_{i}\) and \(\mathbf{x_{ij}}\) in all time periods.
Mathematically, strict exogeneity is written as \[E[u_{ij} | \mathbf{x_{i1}}, \mathbf{x_{i2}}, \ldots, \mathbf{x_{iJ}}, a_{i}] =0\]
It implies that \(u_{ij}\) in each time period is uncorrelated with \(\mathbf{x_{ij}}\) in each time period \[E[\mathbf{x_{is}^{'}}u_{ij}] =0, \forall s,j = 1,2,\ldots, J\]

Strict Exogeneity

It also implies that the unobserved effect is uncorrelated with \(u_{ij}\) in every time period \[E[a_{i}u_{is}] =0, \forall s = 1,2,\ldots, J\]
Strict exogeneity assumptions are necessary for consistency of estimators we discuss below
Note that strict exogeneity says nothing about the correlation between \(\mathbf{x_{ij}}\) and \(a_{i}\)
- We must make an additional assumption
- It is this assumption that determines what model we use

Panel Data Estimation Techniques

Fixed Effects

If we assume \(\mathbf{x_{ij}}\) and \(a_{i}\) are correlated, then fixed effects is the appropriate method \[y_{ij} = \mathbf{x_{ij}}\boldsymbol{\beta} +a_{i} + u_{ij}\]
This is the most popular panel data method
- Main use of panel data is to account for an unobserved factors correlated with \(\mathbf{x_{ij}}\)
- In this case, we assume the unobserved factor is time constant
Fixed effects is a method to remove the influence of \(a_{i}\) to get estimates of \(\boldsymbol{\beta}\)
There are two main ways to estimate a fixed model

Fixed Effects

“Within” Transformation

Find the average of each variable within the cross-sectional unit \[\bar{y}_{i} = \mathbf{\bar{x}_{i}}\boldsymbol{\beta} +a_{i} + \bar{u}_{i}\]
Then subtract the within unit mean from each observation \[y_{ij} - \bar{y}_{i} = (\mathbf{x_{ij}} - \mathbf{\bar{x}_{i}})\boldsymbol{\beta} + a_{i} - a_{i} + u_{ij} - \bar{u}_{i}\] \[y_{ij} - \bar{y}_{i} = (\mathbf{x_{ij}} - \mathbf{\bar{x}_{i}})\boldsymbol{\beta} + u_{ij} - \bar{u}_{i}\] \[y_{ij}^{*} = \mathbf{x_{ij}}^{*}\boldsymbol{\beta} + u_{ij}^{*}\]
Since \(a_{i}\) does not vary across \(j\), \(a_{i}\) is eliminated when we subtract the means
If we estimate the equation above by OLS, we get the Fixed Effects Estimator \[\boldsymbol{\hat{\beta}_{fe}} = \mathbf{(X^{*'} X^{*})^{-1}X^{*'} Y^{*}}\]

Fixed Effects

The matrix \(\mathbf{X^{*}}\) contains all observations over time and persons and is \((N\times J)\) by \(K\)
It is easiest to think of them as stacked cross-sectional observations
For each cross-sectional observation we have \[\mathbf{X_{i}} = \begin{bmatrix} x_{i1}^{1} &x_{i1}^{2}&\ldots &x_{i1}^{K}\\ x_{i2}^{1} &x_{i2}^{2}&\ldots &x_{i2}^{K} \\ \vdots &\ddots & \ldots & \vdots \\ x_{iJ}^{1}&x_{iJ}^{2}&\ldots &x_{iJ}^{K} \\ \end{bmatrix} , \mathbf{\bar{X}_{i}} =\begin{bmatrix} \bar{x_{i}}^{1} &\bar{x_{i}}^{2}&\ldots &\bar{x_{i}}^{K}\\ \bar{x_{i}}^{1} &\bar{x_{i}}^{2}&\ldots &\bar{x_{i}}^{K} \\ \vdots &\ddots & \ldots & \vdots \\ \bar{x_{i}}^{1} &\bar{x_{i}}^{2}&\ldots &\bar{x_{i}}^{K} \\ \end{bmatrix}\] \[\mathbf{X^{*}_{i}} = \begin{bmatrix} (x_{i1}^{1} - \bar{x}_{i}^{1}) &(x_{i1}^{2}-\bar{x}_{i}^{2})&\ldots &(x_{i1}^{K}-\bar{x}_{i}^{K})\\ (x_{i2}^{1}-\bar{x}_{i}^{1}) &(x_{i2}^{2}-\bar{x}_{i}^{2})&\ldots &(x_{i2}^{K}-\bar{x}_{i}^{K}) \\ \vdots &\ddots & \ldots & \vdots \\ (x_{iJ}^{1} -\bar{x}_{i}^{1})&(x_{iJ}^{2}-\bar{x}_{i}^{2})&\ldots &(x_{iJ}^{K}-\bar{x}_{i}^{K}) \\ \end{bmatrix} = \mathbf{X_{i}} - \mathbf{\bar{X}_{i}}\]

Fixed Effects

These matrices are then stacked on top of each other \[\mathbf{X^{*}} = \begin{bmatrix} \mathbf{X^{*}_{1}} \\ \mathbf{X^{*}_{2}} \\ \vdots \\ \mathbf{X^{*}_{N}} \end{bmatrix}\]
The fixed effects estimator can be derived using the time-demeaning matrix
Let \(\mathbf{i}\) be a \(J \times 1\) column of ones
Define the \(J \times J\) matrix \(\mathbf{M_{0}}\) as one that turns the columns of any matrix with \(J\) rows into deviations from means \[\mathbf{M_{0}} = \mathbf{I}_{J} - \frac{1}{J}\mathbf{ii'}\]

Fixed Effects

Next, define the following \(NJ \times NJ\) matrix \(\mathbf{M_{D}}\) \[\mathbf{M_{D}} = \begin{bmatrix} \mathbf{M_{0}} & 0 & \ldots &0\\ 0 & \mathbf{M_{0}}& \ldots &0 \\ \vdots &\ddots & \ldots & \vdots \\ 0 & 0& \ldots & \mathbf{M_{0}} \\ \end{bmatrix}\]
The Fixed Effects estimator can be written as \[\boldsymbol{\hat{\beta}_{fe}} = \mathbf{(X^{'}M_{D}X)^{-1}X^{'}M_{D} Y}\]

Fixed Effects

The robust estimator for the variance covariance matrix for \(\boldsymbol{\hat{\beta}_{fe}}\) is \[\hat{var}(\boldsymbol{\hat{\beta}_{fe}})= \mathbf{(X^{*'}_{i} X^{*}_{i})^{-1}} \left ( \sum_{i=1}^{n} \mathbf{X^{*'}_{i} \hat{u}_{i}^{*} \hat{u}_{i}^{*'}X^{*}_{i}}\right ) \mathbf{(X^{*'}_{i} X^{*}_{i})^{-1}}\]
This estimator is robust to both heteroskedasticity and serial correlation
- Serial correlation is an issue because the data have a time element
- Heteroskedasticity can happen because the data have a cross-sectional element
If error has no heteroskedasticity and no serial correlation, you can simplify this variance estimator

Fixed Effects

A second estimation method is the Dummy Variable Regression

In this model, we include a dummy variable for each cross-sectional unit
- The interpretation of \(a_{i}\) changes
- It is now considered a parameter and not a variable
- In practice it does not matter because it produces the same results as within estimator
The regression is \[y_{ij} = \mathbf{x_{ij}}\boldsymbol{\beta} + \mathbf{D_{i}}\boldsymbol{\alpha} + u_{ij}\]
Where \(D_{i}\) is a vector of dummy variables indicating the cross-sectional unit, and \(\alpha\) is an \(N-1\) vector (we exclude 1 of the dummies to identify the model)
The vector \(\boldsymbol{\hat{\beta}_{DVR}}\) will be identical to \(\boldsymbol{\hat{\beta}_{fe}}\)

Fixed Effects

For a couple of reasons we usually do not use the DVR approach
1. If \(N\) is large, it takes forever to estimate
2. We do not care about \(a_{i}\) normally
3. The estimator for fixed effects is not consistent as \(N \rightarrow \infty\)
With the DVR approach, you would use the variance estimator we discussed in the OLS section

Fixed Effects

Fixed effects is frequently used to infer causality
- Unobserved variables are “controlled” with the fixed effect
- This is only appropriate if all unobserved heterogeneity is constant over time
  - If unobserved variables differ over time, no causality can be inferred

First Differencing

In this method, we still assume \(\mathbf{x_{ij}}\) and \(a_{i}\) are correlated
The estimating equation is \[y_{ij} = \mathbf{x_{ij}}\boldsymbol{\beta} +a_{i} + u_{ij}\]
Imagine lagging this equation by 1 time period \[y_{ij-1} = \mathbf{x_{ij-1}}\boldsymbol{\beta} +a_{i} + u_{ij-1}\]
Then difference the equations \[y_{ij} - y_{ij-1} = \mathbf{(x_{ij} -x_{ij-1}) }\boldsymbol{\beta} +a_{i}-a_{i} + u_{ij} -u_{ij-1}\] \[\Delta y_{ij}= \mathbf{(\Delta x_{ij} ) }\boldsymbol{\beta} + \Delta u_{ij}\]
Since \(a_{i}\) is constant over time for each cross-sectional unit, it is eliminated when we difference

First Differencing

The amount of data we have left after differencing depends on the number of time periods
- If T = 2, then we are left with 1 observation per person
- If T = 3, then we are left with 2 observations per person
- etc...
The first difference estimator is the OLS estimator applied to the differenced data
- Collecting \(y_{ij}\) and \(\mathbf{x_{ij}}\) into matrices, we get \[\boldsymbol{\hat{\beta}_{fd}} = \mathbf{((\Delta X)^{'}(\Delta X))^{-1}(\Delta X)^{'}(\Delta Y)}\]

First Differencing

The robust variance covariance matrix for \(\boldsymbol{\beta_{fd}}\) is

\[\hat{var}(\boldsymbol{\hat{\beta}_{fd}}) = \mathbf{(\Delta X' \Delta X)^{-1}} \left ( \sum_{i=1}^{n} \mathbf{\Delta X'_{i} \Delta\hat{u}_{i} \Delta\hat{u}'_{i}\Delta X_{i}}\right ) \mathbf{(\Delta X' \Delta X)^{-1}}\]
Again this is robust to both heteroskedasticity and serial correlation
When using this method:
- There must be variation in a variable over time for it to be included
- To infer a causal relationship, the unobserved heterogeneity must be time constant

Random Effects

In random effects we assume \(\mathbf{x_{ij}}\) and \(a_{i}\) are uncorrelated
- Thus, random effects is NOT an identification strategy
- It forces \(a_{i}\) into the error term
- Putting \(a_{i}\) into the error term implies a specific error structure
The model is still \[y_{ij} = \mathbf{x_{ij}}\boldsymbol{\beta} +a_{i} + u_{ij}\]
Except now we force \(a_{i}\) into the error \[y_{ij} = \mathbf{x_{ij}}\boldsymbol{\beta} +v_{ij}\] \[v_{ij} = a_{i} + u_{ij}\]
Implies a “block correlation” in errors across \(i\)

Random Effects

The random effects model attempts to harness this block correlation
The method imposes certain assumptions on the data (in addition to strict exogeneity)
1. \(E(u_{ij}^2) = \sigma_{u}\) (homoskedasticity)
2. \(E(u_{ij}u_{is}) = 0, \forall j \neq s\) (no serial correlation)
Strict exogeniety implies \(E(a_{i}u_{ij}) =0, \forall t\)
Under these assumptions
- The diagonal elements of the variance covariance matrix for person \(i\) are \[E(v_{ij}^{2}) = E(a_{i}^{2}) + 2E(a_{i}u_{ij}) + E(u_{ij}^{2}) = \sigma_{a} + \sigma_{u}\]
- The off-diagonal elements of the variance covariance matrix for person \(i\) are \[E(v_{ij}v_{is}) = E[(a_{i} - u_{ij})(a_{i} - u_{is})] = E(a_{i}^{2}) = \sigma_{a}\]

Random Effects

If we group all observations for person \(i\) into a matrix, then

\[\boldsymbol{\Sigma} = E(\mathbf{v_{i}v_{i}'}) = \begin{bmatrix} \sigma_{a} + \sigma_{\epsilon}& \sigma_{a} & \ldots & \sigma_{a}\\ \sigma_{a} & \sigma_{a} + \sigma_{\epsilon}& \ldots & \sigma_{a} \\ \vdots &\ddots & \ldots & \vdots \\ \sigma_{a} & \sigma_{a}& \ldots & \sigma_{a} + \sigma_{\epsilon} \\ \end{bmatrix}\]
The variance covariance matrix of the errors for the whole data set is \[\boldsymbol{\Omega} = \begin{bmatrix} \boldsymbol{\Sigma}& \mathbf{0} & \ldots & \mathbf{0}\\ \mathbf{0}& \boldsymbol{\Sigma}& \ldots & \mathbf{0} \\ \vdots &\ddots & \ldots & \vdots \\ \mathbf{0}& \mathbf{0}& \ldots & \boldsymbol{\Sigma}\\ \end{bmatrix}\]
This is the “random effects structure”
- Errors are correlated within \(i\), but not across \(i\)

Random Effects

The random effects estimator is GLS applied to the data using the random effects error structure
GLS “transforms” the data, and runs OLS on the transformed data
The random effects estimator is \[\boldsymbol{\hat{\beta}_{re}} = \mathbf{(X' \hat{\Omega}^{-1}X)^{-1}X'\hat{\Omega}^{-1}Y}\]
\(\boldsymbol{\hat{\Omega}}\) is the estimated variance covariance matrix with \(\sigma_{u}\) and \(\sigma_{a}\) replaced by estimates
- Note that for \(\boldsymbol{\hat{\beta}_{re}}\) to be efficient, we must make stronger assumptions than before
  - \(E(u_{ij}^2|\mathbf{x_{i}}, a_{i}) = \sigma_{u}\)
  - \(E(u_{ij}u_{is}|\mathbf{x_{i}}, a_{i}) = 0, \forall j \neq s\)
  - \(E(a_{i}^2|\mathbf{x_{i}}) = \sigma_{a}\)

Random Effects

We need a consistent estimates of \(\sigma_{\epsilon}\) and \(\sigma_{a}\)
To do so, we follow the method of Wooldridge (2002)
- Recall that \[y_{ij} = \mathbf{x_{ij}}\boldsymbol{\beta} +v_{ij}\] \[v_{ij} = a_{i} + u_{ij}\]
- Because of the second equation \[\sigma_{v}^2 = \sigma_{a}^2 + \sigma_{u}^2\]
- We will find \(\sigma_{v}^2\) and \(\sigma_{a}^2\), then deduce \(\sigma_{u}^2\)

Random Effects

Because of our previous assumptions, \[\sigma_{v}^2 = \frac{1}{J}\sum_{j=1}^{J}E(v_{ij}^2)\]
- This is true for each individual
- Replace \(E(v_{ij}^2)\) with a sample average across \(i\) using consistent estimates of \(v_{ij}\)
Because we have assumed \(\mathbf{x_{ij}}\) and \(a_{i}\) are uncorrelated, we can obtain consistent estimates of \(v_{ij}\) from pooled OLS
- Regress \(y_{ij} = \mathbf{x_{ij}}\boldsymbol{\beta} +v_{ij}\)
- Keep the residuals from this regression, \(\hat{v}_{ij}\)
- Then for the estimate of \(\sigma_{v}^2\) \[\hat{\sigma}_{v}^2 = \frac{1}{NJ-K}\sum_{i=1}^{N}\sum_{j=1}^{J}\hat{v}_{ij}^2\]

Random Effects

Now, we obtain an estimate of \(\sigma_{a}\) using a similar method \[\sigma_{a}^2 = \frac{1}{J(J-1)/2}\sum_{j=1}^{J-1}\sum_{s = j+1}^{J}E(v_{ij}v_{is})\]
- This is true for each individual
- Replace \(E(v_{ij}v_{is})\) with a sample average across \(i\) using consistent estimates of \(v_{ij}\)

Random Effects

The estimate of \(\sigma_{a}^2\) is \[\hat{\sigma}_{a}^2 = \frac{1}{NJ(J-1)/2 - K}\sum_{j=1}^{J-1}\sum_{s = j+1}^{J}\sum_{i=1}^{N}\hat{v}_{ij}\hat{v}_{is}\]
The idea is that there are \(J(J-1)/2\) cross-products of errors for each individual
Averaging these errors together for each person, then averaging across all people, we get a consistent estimate
Once we have \(\hat{\sigma}_{a}^2\) and \(\hat{\sigma}_{v}^2\), we obtain \(\hat{\sigma}_{u}^2 = \hat{\sigma}_{v}^2-\hat{\sigma}_{a}^2\)
- Use these values in \(\boldsymbol{\hat{\Omega}}\), and we have all that is required for the Random Effects Estimator

Random Effects

With all of these assumptions, the variance estimate of the random effects estimator is

\[\hat{var}( \boldsymbol{\hat{\beta}_{re}}) = \mathbf{(X'\boldsymbol{\hat{\Omega}}^{-1}X)^{-1} }\]
This procedure depends on the assumptions we have made about the random effects structure
We could also avoid those assumptions and use a robust variance estimator \[\hat{var}(\boldsymbol{\hat{\beta}_{re}})= \mathbf{(X'\boldsymbol{\hat{\Omega}}^{-1}X)^{-1} }\left ( \sum_{i=1}^{n} \mathbf{X^{'}_{i} \hat{\Sigma}^{-1} \hat{u}_{i} \hat{u}_{i}^{'}\hat{\Sigma}^{-1}X_{i}}\right ) \mathbf{(X'\boldsymbol{\hat{\Omega}}^{-1}X)^{-1} }\]
But Random Effects is generally all about the structure of the errors
- So if you are going to avoid making those assumptions, then you can use OLS