Types of panel data

Large \(T\) small \(N\)
- Tall narrow panel
- e.g.: USA’s major macro indicators
- e.g.: Industry 1-10 in fama-french.csv
Large \(N\) small \(T\)
- Short wide panel
- e.g.: Panel study of income dynamics (PSID)
- e.g.: familyfirms.csv: 2254 firm-year observations during 1992–1999

library(plm)
d0 <- read.csv("familyfirms.csv", header = TRUE)
dp <- pdata.frame(d0, index = c("company", "year") )
head(dp)

##           year company agefirm meanagef assets bs_volatility founderCEO Q
## 1045-1992 1992    1045      58       95  18706             0          0 1
## 1045-1993 1993    1045      59       95  19326             0          0 1
## 1045-1994 1994    1045      60       95  19486             0          0 1
## 1045-1995 1995    1045      61       95  19556             0          0 1
## 1045-1996 1996    1045      62       95  20497             0          0 1
## 1045-1997 1997    1045      63       95  20915             0          0 1
##           digit2_in
## 1045-1992        45
## 1045-1993        45
## 1045-1994        45
## 1045-1995        45
## 1045-1996        45
## 1045-1997        45

Large \(N\) and large \(T\)
- e.g.: Penn World Table
Balanced panel
unbalanced panel

Why panel data

Historically, panel data was economists’ first encounter of big data
Statistical efficiency
- If \(E[r_{it}]=\mu\), the best way to estimate \(\mu\) is to average data over \(i\) and \(t\) altogether
- The same principle applies to estimating parameters in regression models
Model heterogeneity
- \(y_{it} = x_{it}' \beta + u_{it}\)
- \(u_{it}\) may be endogenous
- If endogeneity comes from a time invariant component, panel data can solve it without instrumental variables

Two extreme specifications

The general model: heterogeneous coefficients

\[ y_{it} = x_{it}'\beta_i + u_{it} \]

in which \(\beta_i\) cannot be consistently estimated with small \(T\)

The restrictive model: common coefficient

\[ y_{it} = x_{it}'\beta + u_{it} \]

in which the pooled OLS is consistent with convergence rate \(\sqrt{NT}\)

Testing the two extreme cases

The null hypothesis: \(\beta_1 = \beta_2 = \cdots= \beta_N = \beta\)
If the dimension of \(\beta_i\) is \(p\), then there are in total \(p(N-1)\) linear restrictions
The \(F\)-statistic

\[ F = \frac{(RSS_R - RSS_U)/[p(N-1)]}{RSS_U / [N(T-p)]} \]

If \(u_{ij} \sim N(0,\sigma^2)\), under the null we have

\[ F \sim \mbox{F-distribution}(p(N-1),N(T-p)) \]

If the null is rejected, we may consider middle-ground case of latent group structure (Su, Shi and Phillips, 2016)

Fixed effects

The model

\[ y_{it} = \alpha_i + x_{it}'\beta + u_{it},\ \ u_{it} \sim (0,\sigma^2) \]

If \(\alpha_i\) is specific for each individual:
- Traditionally, \(\alpha_i\) is viewed as an unknown parameter
- If \(T\) is small, \(\alpha_i\) cannot be consistently estimated
A modern approach views \(\alpha_i\) as a random variable
- Interpreted as the individual heterogeneity
- It may potentially correlate with \(x_{it}\)
- As \(\alpha_i\) is unobservable, it is absorbed into the error term
- Regress \(y_{it}\) on \(x_{it}\) will produce inconsistent estimator

Estimating the fixed effects model

Within-group demean
For each \(i\), average over the \(T\) observations to obtain

\[ \bar{y}_{i} = \alpha_i + \bar{x}_{i}'\beta + \bar{u}_{i} \]

Subtract the above equation from the original equation:

\[ y_{it} - \bar{y}_i = (x_{it}-\bar{x}_i)'\beta +( u_{it} - \bar{u}_i) \]

to get rid of \(\alpha_i\), the source of endogeneity

Estimator and statistical properties

Regress \((y_{it} - \bar{y}_i)\) on \((x_{it}-\bar{x}_i)\) to obtain \(\hat{\beta}\)
Recover the individual intercept \(\hat{\alpha}_i = \bar{y}_{it} - \bar{x}_{i}'\hat{\beta}\)
Necessary condition for consistency: \(E[ (x_{it}-\bar{x}_i) ( u_{it} - \bar{u}_i) ] =0\)
Sufficient condition for consistency and unbiasedness: \(E[u_{it}|(x_{it})_{t=1}^T ] =0\) (strict exogeneity)

Estimate the fixed effects model (continue)

Alternative approach: least square dummy variables (LSDV) regression

\[ \begin{align} y_{it} & = \sum_{j=1}^N \alpha_j \mathbb{I}\{j = i\} + x_{it}'\beta + u_{it} \\ &= \boldsymbol{\alpha}^{\prime} \mathbf{D}_i + x_{it}'\beta + u_{it} \end{align} \] where \(\mathbf{D}_i\) is an \(N\)-vector of dummy variables \((0,\ldots,0,1,0,\ldots,0)'\)

LSDV and the within-group demeaning approach produce exactly the same estimated \((\hat{\beta},(\hat{\alpha})_{i=1}^N)\) (by the Frisch-Waugh-Lovell theorem)

Random effects

Decompose \(\alpha_i = \alpha + v_i\), where \(v_i \sim (0, \sigma_v^2)\)
Then \(v_i\) will be absorbed into the error term

\[ \begin{align} y_{it} & = \alpha + x_{it}'\beta + (v_i + u_{it}) \\ & = \alpha + x_{it}'\beta + w_{it}, \end{align} \]

where \(w_{it}:= v_i + u_{it}\) as well

If \(v_i\) is uncorrelated with \(x_{it}\), then \(w_{it}\) is uncorrelated with \(x_{it}\)
\((\alpha, \beta)\) can be consistently estimated by OLS

Covariance structure

For each \(i\), the common \(v_i\) induces correlation between \(w_{it}\) and \(w_{is}\) for \(t\neq s\)
Let \(\mathbf{w}_{i} = (w_{i1},\ldots,w_{iT})'\)
For simplicity, we assume
1. \(v_i\) uncorrelated with \(u_{it}\)
2. \(u_{it}\) serially uncorrelated over time

\[ \Omega := E[\mathbf{w}_i \mathbf{w}_i'] = \sigma_v^2 \mathbf{1}_N \mathbf{1}_N'+ \sigma^2 \mathbf{I}_N \]

Conditions for Gauss-Markov theorem is violated

Efficient estimation

Restore the condition for the Gauss-Markov theorem

\[ \Omega^{-1/2} \mathbf{y}_i= \Omega^{-1/2} \alpha \mathbf{1}_T + \Omega^{-1/2} \mathbf{x}_{i}'\beta + \Omega^{-1/2} \mathbf{w}_{i}, \]

and notice \(\Omega^{-1/2} E[\mathbf{w}_i \mathbf{w}_i'] \Omega^{-1/2} = \mathbf{I}_N\)

Generalized least squares (GLS) estimator
Feasible estimation:
1. Use a consistent estimator to construct \(\hat{\Omega}\)
2. Use \(\hat{\Omega}\) to transform the data and run another OLS estimation

Real data example

OLS is equivalent to pooling

eq <- log(Q) ~ founderCEO + log(assets) + log(agefirm+1) + bs_volatility

OLS.lm <- lm( eq, data = dp ) # some agefirm = 0. add one to take log
OLS.plm <- plm( eq, data = dp, effect = "individual", model = "pooling" )
print(OLS.lm)

## 
## Call:
## lm(formula = eq, data = dp)
## 
## Coefficients:
##      (Intercept)        founderCEO       log(assets)  log(agefirm + 1)  
##         0.598758          0.196479         -0.001051         -0.022578  
##    bs_volatility  
##        -0.093353

print(OLS.plm)

## 
## Model Formula: log(Q) ~ founderCEO + log(assets) + log(agefirm + 1) + bs_volatility
## 
## Coefficients:
##      (Intercept)       founderCEO      log(assets) log(agefirm + 1) 
##        0.5987582        0.1964793       -0.0010507       -0.0225779 
##    bs_volatility 
##       -0.0933531

FE and RE

FE <- plm( eq, data = dp, effect = "individual", model = "within" )
RE <- plm( eq, data = dp, effect = "individual", model = "random" )

print(FE)

## 
## Model Formula: log(Q) ~ founderCEO + log(assets) + log(agefirm + 1) + bs_volatility
## 
## Coefficients:
##       founderCEO      log(assets) log(agefirm + 1)    bs_volatility 
##       0.03713317       0.00043818       0.36230453      -0.21626849

print(RE)

## 
## Model Formula: log(Q) ~ founderCEO + log(assets) + log(agefirm + 1) + bs_volatility
## 
## Coefficients:
##      (Intercept)       founderCEO      log(assets) log(agefirm + 1) 
##         0.207554         0.104535         0.030767         0.011166 
##    bs_volatility 
##        -0.170891

FE vs. RE

The random effects model is a special case of the fixed effects model
RE can be tested
- Null hypothesis: Random effects
- Alternative hypothesis: Fixed effects
Idea of testing:
- Under the null, RE is consistent and efficient, while FE is consistent but inefficient
- Under the alternative, RE is inconsistent, whereas FE is still consistent

Hausman test

Similar to the scenario of endogeneity test
- Recall the Durbin-Wu-Hausman test
The test statistic

\[ W = (\hat{\beta}_{FE} - \hat{\beta}_{RE} )' (var(\hat{\beta}_{FE}) - var(\hat{\beta}_{RE}) )^{-1} (\hat{\beta}_{FE} - \hat{\beta}_{RE} ) \stackrel{d}{\rightarrow} \chi^2 (p) \]

hausman <- phtest( eq, data = dp, model = c("within", "random")  )
print(hausman)

## 
##  Hausman Test
## 
## data:  eq
## chisq = 38.205, df = 4, p-value = 1.017e-07
## alternative hypothesis: one model is inconsistent

Dynamic panel

Dynamic specification:

\[ y_{it} = \alpha_i + x_{it}' \beta + \rho y_{i,t-1} + u_{it} \]

where \(u_{it}\) is uncorrelated with \(y_{i,t-1}\)

Let \(\bar{y}_{i,-1} = T^{-1} \sum_{t=1}^T y_{i,t-1}\) (assume \(y_{i,0}\) is observable in the sample). Within-group transformation produces correlation in \((y_{i,t-1}-\bar{y}_{i,-1})\) and \(( u_{it} - \bar{u}_i)\)
Demonstration: consider \(\beta = 0\) and \(T = 2\) for simplicity:

\[ 0.5(y_{i,2} - y_{i,1}) = 0.5\rho(y_{i,1}-y_{i,0}) + 0.5(u_{i,2}-u_{i,1}) \]

correlation arises between \(y_{i,1}\) and \(u_{i,1}\)

The transformed equation is

\[ y_{it} - \bar{y}_i = \rho (y_{i,t-1}-\bar{y}_{i,-1}) +( u_{it} - \bar{u}_i) \]

The correlation is non-trivial when \(T\) is small

Nickell bias

Nickell (1981)
The correlation

\[ \begin{align} cov[\bar{y}_{i,-1},u_{it}] & = cov\left[ T^{-1} \sum_{s=1}^T y_{i,s-1}, u_{it} \right] \\ & = cov\left[ T^{-1} \sum_{s = j+1}^T y_{i,s-1}, u_{it} \right] \\ & \approx \frac{\sigma^2}{(1-\rho)T} \end{align} \]

When \(T\) is fixed while \(N\to \infty\), we have

\[ \hat{\rho} - \rho_0 = - \frac{cov[\bar{y}_{i,-1}, u_{it}]}{var[y_{i,t-1} - \bar{y}_{i,-1}]} \approx - \frac{\frac{\sigma^2}{(1-\rho)T}}{\frac{1}{(1-\rho^2)T}} = -\frac{1+\rho}{T} \]

Bias in random effects models

Dynamic panel with random effects:

\[ \begin{align} y_{it} & = \alpha + \rho y_{i,t-1} + (v_i + u_{it}) \\ & = \alpha + \rho (\alpha + \rho y_{i,t-2} + v_i + u_{i,t-1})+ (v_i + u_{it}) \\ & = \cdots \\ \end{align} \]

makes it clear that \(y_{i,t-1}\) and \(w_{it} = v_i + u_{it}\) are correlated as

\[ cov[y_{i,t-1},w_{it}] = (\rho + \rho^2 + \cdots + \rho^{t})\sigma^2_v \]

The correlation vanishes when \(\rho = 0\) or \(\sigma_v^2 = 0\) only

First differencing

To get rid of \(\alpha_i\), an alternative way is to first-differencing:

\[ \Delta y_{it} = \rho \Delta y_{i,t-1} + \Delta u_{it} \]

where \(\Delta y_{it} = y_{it} - y_{i,t-1}\) is the differenced version of \(y_{it}\). are Similarly defined are \(\Delta y_{i,t-1}\) and \(\Delta u_{it}\)

However, the regressor \(\Delta y_{i,t-1}\) remains endogenous in that

\[ \begin{align} cov[\Delta y_{i,t-1}, \Delta u_{it}] & =E[(y_{i,t-1} - y_{i,t-2}) (u_{it} - u_{i,t-1})] \\ & =E[ y_{i,t-1} u_{i,t-1} ] \\ & = \sigma^2 \end{align} \]

OLS for the first-differenced equation still suffers inconsistency

Solution

Instrumental variables
- Lag variables naturally emerge from the DGP
Anderson and Hsiao (1981): \(z_{it} = (y_{i,t-2}, y_{i,t-3})\)
- Orthogonal to \((u_{it} - u_{i,t-1})\)
- Relevant if \(\rho \neq 0\)
- Two IVs for one endognous variable — overidentification allows testing validity of IVs
Arellano and Bond (1991): \(z_{it} = (y_{i,t-2}, y_{i,t-3}, \ldots, y_{i1})\)
- Many IVs. Need selection
Estimation: for finite \(T\) and \(N\to \infty\),
- 2SLS is consistent and asymptotically normal
- The generalized method of moments (GMM, not covered in this course) is more general and efficient

Unit root in panel data

Dynamic AR(1) panel

\[ y_{it} = \alpha_i + \rho y_{i,t-1} + e_{it} \]

If \(y_{it}\) are all highly persistent, we are interesting in testing \(\rho = 1\) (unit root)

Levin, Lin, and Chu (2002)

Estimate the augmented regression

\[ \Delta y_{it} = \alpha_i + \beta y_{i,t-1} + \sum_{k=1}^L \gamma_{ik} \Delta y_{i,t-k} + e_{it} \]

similar to that for the augmented Dicky-Fuller test

Null hypothesis: \(\beta = \rho - 1 = 0\)
Alternative hypothesis: \(\beta < 0\)
Asymptotics: Large \(T\) and large \(N\)
At appropriate rates for \(N\) and \(T\), the (modified) \(t\)-statistic is asymptotically \(N(0,1)\)

Im, Pesaran and Shin (2003)

A more general equation

\[ \Delta y_{it} = \alpha_i + \beta_i y_{i,t-1} + \sum_{k=1}^L \gamma_{ik} \Delta y_{i,t-k} + e_{it} \]

which allows heterogeneous \(\beta_i\) across individuals

Null hypothesis: \(\beta_1 = \beta_2 = \cdots = \beta_N = 0\)
Alternative hypothesis: Some \(\beta_i < 0\)
Test statistic: another modified \(t\)-statistic which is asymptotically \(N(0,1)\) under the null

R implementation

package punitroots
Syntax

purtest(object, test = c("levinlin", "ips"),
exo = c("none", "intercept", "trend"),
lags = c("SIC", "AIC"), pmax = 10)

object: a \(T\times N\) matrix
test:
- levinlin for Levin, Lin and Chu (2002)
- ips for Im, Pesaran and Shin (2003)
exo: the deterministic component in the time series
lags: lag selection criterion and
pmax: the number of maximum lags

Panel Data

Data types