Panel Data

EC655

Justin Smith

Wilfrid Laurier University

Fall 2022

Introduction

Introduction

  • Repeated observations of some individual unit along some dimension

    • Typically, observing same people/firms/countries over time

    • Second dimension does not have to be time

  • Panel data can be used in several ways

    1. Deal with individual heterogeneity

    2. Increase variation (reduce standard errors)

    3. Study dynamics

  • In microeconometrics, panel data mostly controls for individual heterogeneity

  • We will study

    • Basic panel data methods

    • Using panel data to identify parameters

Panel Data Basics

Structure of Panel Data

  • Panels have at least 2 dimensions

    • Variation occurs over \(i=1,\ldots,N\) people, and \(j=1,\ldots,J\) time
  • A balanced panel is one where all individuals are observed in every time period

  • An unbalanced panel has at least one person not observed in a time period

Balanced Panel

ID Year Income
1 1990 60000
1 1991 65000
1 1992 90000
2 1990 20000
2 1991 21000
2 1992 24000

Unbalanced Panel

ID Year Education
1 1990 60000
1 1991 90000
2 1990 20000
2 1991 21000
2 1992 24000

Unobserved Effects Model

  • Like regular regression model, but with unobserved variable that only varies over individuals

  • The regression model is \[y_{ij} = \mathbf{x_{ij}}\boldsymbol{\beta} +a_{i} + u_{ij}\]

    • \(\mathbf{x_{ij}}\) contains a constant

    • \(a_{i}\) is the unobserved effect

  • Assume we are not interested in the effect of \(a_{i}\) on \(y_{ij}\)

    • Usually it is not observed, or unmeasurable anyway

    • This is why it is not written with a parameter

  • Several models we can use to estimate the parameters \(\boldsymbol{\beta}\)

    • Depends on assumption about relationship between \(\mathbf{x_{ij}}\) and \(a_{i}\)

Strict Exogeneity

  • We learned regression model was structural when error was mean independent

  • Panel model is structural when we assume Strict Exogeneity

    • Error term \(u_{ij}\) has zero mean conditional on \(a_{i}\) and \(\mathbf{x_{ij}}\) in all time periods.
  • Mathematically, strict exogeneity is written as \[E[u_{ij} | \mathbf{x_{i1}}, \mathbf{x_{i2}}, \ldots, \mathbf{x_{iJ}}, a_{i}] =0\]

  • It implies that \(u_{ij}\) in each time period is uncorrelated with \(\mathbf{x_{ij}}\) in each time period \[E[\mathbf{x_{is}^{'}}u_{ij}] =0, \forall s,j = 1,2,\ldots, J\]

Strict Exogeneity

  • It also implies that the unobserved effect is uncorrelated with \(u_{ij}\) in every time period \[E[a_{i}u_{is}] =0, \forall s = 1,2,\ldots, J\]

  • Strict exogeneity assumptions are necessary for consistency of estimators we discuss below

  • Note that strict exogeneity says nothing about the correlation between \(\mathbf{x_{ij}}\) and \(a_{i}\)

    • We must make an additional assumption

    • It is this assumption that determines what model we use

Panel Data Estimation Techniques

Fixed Effects

  • If we assume \(\mathbf{x_{ij}}\) and \(a_{i}\) are correlated, then fixed effects is the appropriate method \[y_{ij} = \mathbf{x_{ij}}\boldsymbol{\beta} +a_{i} + u_{ij}\]

  • This is the most popular panel data method

    • Main use of panel data is to account for an unobserved factors correlated with \(\mathbf{x_{ij}}\)

    • In this case, we assume the unobserved factor is time constant

  • Fixed effects is a method to remove the influence of \(a_{i}\) to get estimates of \(\boldsymbol{\beta}\)

  • There are two main ways to estimate a fixed model

Fixed Effects

  1. “Within” Transformation
  • Find the average of each variable within the cross-sectional unit \[\bar{y}_{i} = \mathbf{\bar{x}_{i}}\boldsymbol{\beta} +a_{i} + \bar{u}_{i}\]

  • Then subtract the within unit mean from each observation \[y_{ij} - \bar{y}_{i} = (\mathbf{x_{ij}} - \mathbf{\bar{x}_{i}})\boldsymbol{\beta} + a_{i} - a_{i} + u_{ij} - \bar{u}_{i}\] \[y_{ij} - \bar{y}_{i} = (\mathbf{x_{ij}} - \mathbf{\bar{x}_{i}})\boldsymbol{\beta} + u_{ij} - \bar{u}_{i}\] \[y_{ij}^{*} = \mathbf{x_{ij}}^{*}\boldsymbol{\beta} + u_{ij}^{*}\]

  • Since \(a_{i}\) does not vary across \(j\), \(a_{i}\) is eliminated when we subtract the means

  • If we estimate the equation above by OLS, we get the Fixed Effects Estimator \[\boldsymbol{\hat{\beta}_{fe}} = \mathbf{(X^{*'} X^{*})^{-1}X^{*'} Y^{*}}\]

Fixed Effects

  • The matrix \(\mathbf{X^{*}}\) contains all observations over time and persons and is \((N\times J)\) by \(K\)

  • It is easiest to think of them as stacked cross-sectional observations

  • For each cross-sectional observation we have \[\mathbf{X_{i}} = \begin{bmatrix} x_{i1}^{1} &x_{i1}^{2}&\ldots &x_{i1}^{K}\\ x_{i2}^{1} &x_{i2}^{2}&\ldots &x_{i2}^{K} \\ \vdots &\ddots & \ldots & \vdots \\ x_{iJ}^{1}&x_{iJ}^{2}&\ldots &x_{iJ}^{K} \\ \end{bmatrix} , \mathbf{\bar{X}_{i}} =\begin{bmatrix} \bar{x_{i}}^{1} &\bar{x_{i}}^{2}&\ldots &\bar{x_{i}}^{K}\\ \bar{x_{i}}^{1} &\bar{x_{i}}^{2}&\ldots &\bar{x_{i}}^{K} \\ \vdots &\ddots & \ldots & \vdots \\ \bar{x_{i}}^{1} &\bar{x_{i}}^{2}&\ldots &\bar{x_{i}}^{K} \\ \end{bmatrix}\] \[\mathbf{X^{*}_{i}} = \begin{bmatrix} (x_{i1}^{1} - \bar{x}_{i}^{1}) &(x_{i1}^{2}-\bar{x}_{i}^{2})&\ldots &(x_{i1}^{K}-\bar{x}_{i}^{K})\\ (x_{i2}^{1}-\bar{x}_{i}^{1}) &(x_{i2}^{2}-\bar{x}_{i}^{2})&\ldots &(x_{i2}^{K}-\bar{x}_{i}^{K}) \\ \vdots &\ddots & \ldots & \vdots \\ (x_{iJ}^{1} -\bar{x}_{i}^{1})&(x_{iJ}^{2}-\bar{x}_{i}^{2})&\ldots &(x_{iJ}^{K}-\bar{x}_{i}^{K}) \\ \end{bmatrix} = \mathbf{X_{i}} - \mathbf{\bar{X}_{i}}\]

Fixed Effects

  • These matrices are then stacked on top of each other \[\mathbf{X^{*}} = \begin{bmatrix} \mathbf{X^{*}_{1}} \\ \mathbf{X^{*}_{2}} \\ \vdots \\ \mathbf{X^{*}_{N}} \end{bmatrix}\]

  • The fixed effects estimator can be derived using the time-demeaning matrix

  • Let \(\mathbf{i}\) be a \(J \times 1\) column of ones

  • Define the \(J \times J\) matrix \(\mathbf{M_{0}}\) as one that turns the columns of any matrix with \(J\) rows into deviations from means \[\mathbf{M_{0}} = \mathbf{I}_{J} - \frac{1}{J}\mathbf{ii'}\]

Fixed Effects

  • Next, define the following \(NJ \times NJ\) matrix \(\mathbf{M_{D}}\) \[\mathbf{M_{D}} = \begin{bmatrix} \mathbf{M_{0}} & 0 & \ldots &0\\ 0 & \mathbf{M_{0}}& \ldots &0 \\ \vdots &\ddots & \ldots & \vdots \\ 0 & 0& \ldots & \mathbf{M_{0}} \\ \end{bmatrix}\]

  • The Fixed Effects estimator can be written as \[\boldsymbol{\hat{\beta}_{fe}} = \mathbf{(X^{'}M_{D}X)^{-1}X^{'}M_{D} Y}\]

Fixed Effects

  • The robust estimator for the variance covariance matrix for \(\boldsymbol{\hat{\beta}_{fe}}\) is \[\hat{var}(\boldsymbol{\hat{\beta}_{fe}})= \mathbf{(X^{*'}_{i} X^{*}_{i})^{-1}} \left ( \sum_{i=1}^{n} \mathbf{X^{*'}_{i} \hat{u}_{i}^{*} \hat{u}_{i}^{*'}X^{*}_{i}}\right ) \mathbf{(X^{*'}_{i} X^{*}_{i})^{-1}}\]

  • This estimator is robust to both heteroskedasticity and serial correlation

    • Serial correlation is an issue because the data have a time element

    • Heteroskedasticity can happen because the data have a cross-sectional element

  • If error has no heteroskedasticity and no serial correlation, you can simplify this variance estimator

Fixed Effects

  1. A second estimation method is the Dummy Variable Regression
  • In this model, we include a dummy variable for each cross-sectional unit

    • The interpretation of \(a_{i}\) changes

    • It is now considered a parameter and not a variable

    • In practice it does not matter because it produces the same results as within estimator

  • The regression is \[y_{ij} = \mathbf{x_{ij}}\boldsymbol{\beta} + \mathbf{D_{i}}\boldsymbol{\alpha} + u_{ij}\]

  • Where \(D_{i}\) is a vector of dummy variables indicating the cross-sectional unit, and \(\alpha\) is an \(N-1\) vector (we exclude 1 of the dummies to identify the model)

  • The vector \(\boldsymbol{\hat{\beta}_{DVR}}\) will be identical to \(\boldsymbol{\hat{\beta}_{fe}}\)

Fixed Effects

  • For a couple of reasons we usually do not use the DVR approach

    1. If \(N\) is large, it takes forever to estimate

    2. We do not care about \(a_{i}\) normally

    3. The estimator for fixed effects is not consistent as \(N \rightarrow \infty\)

  • With the DVR approach, you would use the variance estimator we discussed in the OLS section

Fixed Effects

  • Fixed effects is frequently used to infer causality

    • Unobserved variables are “controlled” with the fixed effect

    • This is only appropriate if all unobserved heterogeneity is constant over time

      • If unobserved variables differ over time, no causality can be inferred

First Differencing

  • In this method, we still assume \(\mathbf{x_{ij}}\) and \(a_{i}\) are correlated

  • The estimating equation is \[y_{ij} = \mathbf{x_{ij}}\boldsymbol{\beta} +a_{i} + u_{ij}\]

  • Imagine lagging this equation by 1 time period \[y_{ij-1} = \mathbf{x_{ij-1}}\boldsymbol{\beta} +a_{i} + u_{ij-1}\]

  • Then difference the equations \[y_{ij} - y_{ij-1} = \mathbf{(x_{ij} -x_{ij-1}) }\boldsymbol{\beta} +a_{i}-a_{i} + u_{ij} -u_{ij-1}\] \[\Delta y_{ij}= \mathbf{(\Delta x_{ij} ) }\boldsymbol{\beta} + \Delta u_{ij}\]

  • Since \(a_{i}\) is constant over time for each cross-sectional unit, it is eliminated when we difference

First Differencing

  • The amount of data we have left after differencing depends on the number of time periods

    • If T = 2, then we are left with 1 observation per person

    • If T = 3, then we are left with 2 observations per person

    • etc...

  • The first difference estimator is the OLS estimator applied to the differenced data

    • Collecting \(y_{ij}\) and \(\mathbf{x_{ij}}\) into matrices, we get \[\boldsymbol{\hat{\beta}_{fd}} = \mathbf{((\Delta X)^{'}(\Delta X))^{-1}(\Delta X)^{'}(\Delta Y)}\]

First Differencing

  • The robust variance covariance matrix for \(\boldsymbol{\beta_{fd}}\) is

    \[\hat{var}(\boldsymbol{\hat{\beta}_{fd}}) = \mathbf{(\Delta X' \Delta X)^{-1}} \left ( \sum_{i=1}^{n} \mathbf{\Delta X'_{i} \Delta\hat{u}_{i} \Delta\hat{u}'_{i}\Delta X_{i}}\right ) \mathbf{(\Delta X' \Delta X)^{-1}}\]

  • Again this is robust to both heteroskedasticity and serial correlation

  • When using this method:

    • There must be variation in a variable over time for it to be included

    • To infer a causal relationship, the unobserved heterogeneity must be time constant

Random Effects

  • In random effects we assume \(\mathbf{x_{ij}}\) and \(a_{i}\) are uncorrelated

    • Thus, random effects is NOT an identification strategy

    • It forces \(a_{i}\) into the error term

    • Putting \(a_{i}\) into the error term implies a specific error structure

  • The model is still \[y_{ij} = \mathbf{x_{ij}}\boldsymbol{\beta} +a_{i} + u_{ij}\]

  • Except now we force \(a_{i}\) into the error \[y_{ij} = \mathbf{x_{ij}}\boldsymbol{\beta} +v_{ij}\] \[v_{ij} = a_{i} + u_{ij}\]

  • Implies a “block correlation” in errors across \(i\)

Random Effects

  • The random effects model attempts to harness this block correlation

  • The method imposes certain assumptions on the data (in addition to strict exogeneity)

    1. \(E(u_{ij}^2) = \sigma_{u}\) (homoskedasticity)

    2. \(E(u_{ij}u_{is}) = 0, \forall j \neq s\) (no serial correlation)

  • Strict exogeniety implies \(E(a_{i}u_{ij}) =0, \forall t\)

  • Under these assumptions

    • The diagonal elements of the variance covariance matrix for person \(i\) are \[E(v_{ij}^{2}) = E(a_{i}^{2}) + 2E(a_{i}u_{ij}) + E(u_{ij}^{2}) = \sigma_{a} + \sigma_{u}\]
    • The off-diagonal elements of the variance covariance matrix for person \(i\) are \[E(v_{ij}v_{is}) = E[(a_{i} - u_{ij})(a_{i} - u_{is})] = E(a_{i}^{2}) = \sigma_{a}\]

Random Effects

  • If we group all observations for person \(i\) into a matrix, then

    \[\boldsymbol{\Sigma} = E(\mathbf{v_{i}v_{i}'}) = \begin{bmatrix} \sigma_{a} + \sigma_{\epsilon}& \sigma_{a} & \ldots & \sigma_{a}\\ \sigma_{a} & \sigma_{a} + \sigma_{\epsilon}& \ldots & \sigma_{a} \\ \vdots &\ddots & \ldots & \vdots \\ \sigma_{a} & \sigma_{a}& \ldots & \sigma_{a} + \sigma_{\epsilon} \\ \end{bmatrix}\]

  • The variance covariance matrix of the errors for the whole data set is \[\boldsymbol{\Omega} = \begin{bmatrix} \boldsymbol{\Sigma}& \mathbf{0} & \ldots & \mathbf{0}\\ \mathbf{0}& \boldsymbol{\Sigma}& \ldots & \mathbf{0} \\ \vdots &\ddots & \ldots & \vdots \\ \mathbf{0}& \mathbf{0}& \ldots & \boldsymbol{\Sigma}\\ \end{bmatrix}\]

  • This is the “random effects structure”

    • Errors are correlated within \(i\), but not across \(i\)

Random Effects

  • The random effects estimator is GLS applied to the data using the random effects error structure

  • GLS “transforms” the data, and runs OLS on the transformed data

  • The random effects estimator is \[\boldsymbol{\hat{\beta}_{re}} = \mathbf{(X' \hat{\Omega}^{-1}X)^{-1}X'\hat{\Omega}^{-1}Y}\]

  • \(\boldsymbol{\hat{\Omega}}\) is the estimated variance covariance matrix with \(\sigma_{u}\) and \(\sigma_{a}\) replaced by estimates

    • Note that for \(\boldsymbol{\hat{\beta}_{re}}\) to be efficient, we must make stronger assumptions than before

      • \(E(u_{ij}^2|\mathbf{x_{i}}, a_{i}) = \sigma_{u}\)

      • \(E(u_{ij}u_{is}|\mathbf{x_{i}}, a_{i}) = 0, \forall j \neq s\)

      • \(E(a_{i}^2|\mathbf{x_{i}}) = \sigma_{a}\)

Random Effects

  • We need a consistent estimates of \(\sigma_{\epsilon}\) and \(\sigma_{a}\)

  • To do so, we follow the method of Wooldridge (2002)

    • Recall that \[y_{ij} = \mathbf{x_{ij}}\boldsymbol{\beta} +v_{ij}\] \[v_{ij} = a_{i} + u_{ij}\]

    • Because of the second equation \[\sigma_{v}^2 = \sigma_{a}^2 + \sigma_{u}^2\]

    • We will find \(\sigma_{v}^2\) and \(\sigma_{a}^2\), then deduce \(\sigma_{u}^2\)

Random Effects

  • Because of our previous assumptions, \[\sigma_{v}^2 = \frac{1}{J}\sum_{j=1}^{J}E(v_{ij}^2)\]

    • This is true for each individual

    • Replace \(E(v_{ij}^2)\) with a sample average across \(i\) using consistent estimates of \(v_{ij}\)

  • Because we have assumed \(\mathbf{x_{ij}}\) and \(a_{i}\) are uncorrelated, we can obtain consistent estimates of \(v_{ij}\) from pooled OLS

    • Regress \(y_{ij} = \mathbf{x_{ij}}\boldsymbol{\beta} +v_{ij}\)

    • Keep the residuals from this regression, \(\hat{v}_{ij}\)

    • Then for the estimate of \(\sigma_{v}^2\) \[\hat{\sigma}_{v}^2 = \frac{1}{NJ-K}\sum_{i=1}^{N}\sum_{j=1}^{J}\hat{v}_{ij}^2\]

Random Effects

  • Now, we obtain an estimate of \(\sigma_{a}\) using a similar method \[\sigma_{a}^2 = \frac{1}{J(J-1)/2}\sum_{j=1}^{J-1}\sum_{s = j+1}^{J}E(v_{ij}v_{is})\]

    • This is true for each individual

    • Replace \(E(v_{ij}v_{is})\) with a sample average across \(i\) using consistent estimates of \(v_{ij}\)

Random Effects

  • The estimate of \(\sigma_{a}^2\) is \[\hat{\sigma}_{a}^2 = \frac{1}{NJ(J-1)/2 - K}\sum_{j=1}^{J-1}\sum_{s = j+1}^{J}\sum_{i=1}^{N}\hat{v}_{ij}\hat{v}_{is}\]

  • The idea is that there are \(J(J-1)/2\) cross-products of errors for each individual

  • Averaging these errors together for each person, then averaging across all people, we get a consistent estimate

  • Once we have \(\hat{\sigma}_{a}^2\) and \(\hat{\sigma}_{v}^2\), we obtain \(\hat{\sigma}_{u}^2 = \hat{\sigma}_{v}^2-\hat{\sigma}_{a}^2\)

    • Use these values in \(\boldsymbol{\hat{\Omega}}\), and we have all that is required for the Random Effects Estimator

Random Effects

  • With all of these assumptions, the variance estimate of the random effects estimator is

    \[\hat{var}( \boldsymbol{\hat{\beta}_{re}}) = \mathbf{(X'\boldsymbol{\hat{\Omega}}^{-1}X)^{-1} }\]

  • This procedure depends on the assumptions we have made about the random effects structure

  • We could also avoid those assumptions and use a robust variance estimator \[\hat{var}(\boldsymbol{\hat{\beta}_{re}})= \mathbf{(X'\boldsymbol{\hat{\Omega}}^{-1}X)^{-1} }\left ( \sum_{i=1}^{n} \mathbf{X^{'}_{i} \hat{\Sigma}^{-1} \hat{u}_{i} \hat{u}_{i}^{'}\hat{\Sigma}^{-1}X_{i}}\right ) \mathbf{(X'\boldsymbol{\hat{\Omega}}^{-1}X)^{-1} }\]

  • But Random Effects is generally all about the structure of the errors

    • So if you are going to avoid making those assumptions, then you can use OLS