Panel Data

Zhentao Shi

Nov 15, 2021

Data types

Types of panel data

library(plm)
d0 <- read.csv("familyfirms.csv", header = TRUE)
dp <- pdata.frame(d0, index = c("company", "year") )
head(dp)
##           year company agefirm meanagef assets bs_volatility founderCEO Q
## 1045-1992 1992    1045      58       95  18706             0          0 1
## 1045-1993 1993    1045      59       95  19326             0          0 1
## 1045-1994 1994    1045      60       95  19486             0          0 1
## 1045-1995 1995    1045      61       95  19556             0          0 1
## 1045-1996 1996    1045      62       95  20497             0          0 1
## 1045-1997 1997    1045      63       95  20915             0          0 1
##           digit2_in
## 1045-1992        45
## 1045-1993        45
## 1045-1994        45
## 1045-1995        45
## 1045-1996        45
## 1045-1997        45

Why panel data

Two extreme specifications

\[ y_{it} = x_{it}'\beta_i + u_{it} \]

in which \(\beta_i\) cannot be consistently estimated with small \(T\)

\[ y_{it} = x_{it}'\beta + u_{it} \]

in which the pooled OLS is consistent with convergence rate \(\sqrt{NT}\)

Testing the two extreme cases

\[ F = \frac{(RSS_R - RSS_U)/[p(N-1)]}{RSS_U / [N(T-p)]} \]

\[ F \sim \mbox{F-distribution}(p(N-1),N(T-p)) \]

Fixed effects

\[ y_{it} = \alpha_i + x_{it}'\beta + u_{it},\ \ u_{it} \sim (0,\sigma^2) \]

Estimating the fixed effects model

\[ \bar{y}_{i} = \alpha_i + \bar{x}_{i}'\beta + \bar{u}_{i} \]

\[ y_{it} - \bar{y}_i = (x_{it}-\bar{x}_i)'\beta +( u_{it} - \bar{u}_i) \]

to get rid of \(\alpha_i\), the source of endogeneity

Estimator and statistical properties

Estimate the fixed effects model (continue)

\[ \begin{align} y_{it} & = \sum_{j=1}^N \alpha_j \mathbb{I}\{j = i\} + x_{it}'\beta + u_{it} \\ &= \boldsymbol{\alpha}^{\prime} \mathbf{D}_i + x_{it}'\beta + u_{it} \end{align} \] where \(\mathbf{D}_i\) is an \(N\)-vector of dummy variables \((0,\ldots,0,1,0,\ldots,0)'\)

Random effects

\[ \begin{align} y_{it} & = \alpha + x_{it}'\beta + (v_i + u_{it}) \\ & = \alpha + x_{it}'\beta + w_{it}, \end{align} \]

where \(w_{it}:= v_i + u_{it}\) as well

Covariance structure

\[ \Omega := E[\mathbf{w}_i \mathbf{w}_i'] = \sigma_v^2 \mathbf{1}_N \mathbf{1}_N'+ \sigma^2 \mathbf{I}_N \]

Efficient estimation

\[ \Omega^{-1/2} \mathbf{y}_i= \Omega^{-1/2} \alpha \mathbf{1}_T + \Omega^{-1/2} \mathbf{x}_{i}'\beta + \Omega^{-1/2} \mathbf{w}_{i}, \]

and notice \(\Omega^{-1/2} E[\mathbf{w}_i \mathbf{w}_i'] \Omega^{-1/2} = \mathbf{I}_N\)

Real data example

eq <- log(Q) ~ founderCEO + log(assets) + log(agefirm+1) + bs_volatility

OLS.lm <- lm( eq, data = dp ) # some agefirm = 0. add one to take log
OLS.plm <- plm( eq, data = dp, effect = "individual", model = "pooling" )
print(OLS.lm)
## 
## Call:
## lm(formula = eq, data = dp)
## 
## Coefficients:
##      (Intercept)        founderCEO       log(assets)  log(agefirm + 1)  
##         0.598758          0.196479         -0.001051         -0.022578  
##    bs_volatility  
##        -0.093353
print(OLS.plm)
## 
## Model Formula: log(Q) ~ founderCEO + log(assets) + log(agefirm + 1) + bs_volatility
## 
## Coefficients:
##      (Intercept)       founderCEO      log(assets) log(agefirm + 1) 
##        0.5987582        0.1964793       -0.0010507       -0.0225779 
##    bs_volatility 
##       -0.0933531
FE <- plm( eq, data = dp, effect = "individual", model = "within" )
RE <- plm( eq, data = dp, effect = "individual", model = "random" )

print(FE)
## 
## Model Formula: log(Q) ~ founderCEO + log(assets) + log(agefirm + 1) + bs_volatility
## 
## Coefficients:
##       founderCEO      log(assets) log(agefirm + 1)    bs_volatility 
##       0.03713317       0.00043818       0.36230453      -0.21626849
print(RE)
## 
## Model Formula: log(Q) ~ founderCEO + log(assets) + log(agefirm + 1) + bs_volatility
## 
## Coefficients:
##      (Intercept)       founderCEO      log(assets) log(agefirm + 1) 
##         0.207554         0.104535         0.030767         0.011166 
##    bs_volatility 
##        -0.170891

FE vs. RE

Hausman test

\[ W = (\hat{\beta}_{FE} - \hat{\beta}_{RE} )' (var(\hat{\beta}_{FE}) - var(\hat{\beta}_{RE}) )^{-1} (\hat{\beta}_{FE} - \hat{\beta}_{RE} ) \stackrel{d}{\rightarrow} \chi^2 (p) \]

hausman <- phtest( eq, data = dp, model = c("within", "random")  )
print(hausman)
## 
##  Hausman Test
## 
## data:  eq
## chisq = 38.205, df = 4, p-value = 1.017e-07
## alternative hypothesis: one model is inconsistent

Dynamic panel

\[ y_{it} = \alpha_i + x_{it}' \beta + \rho y_{i,t-1} + u_{it} \]

where \(u_{it}\) is uncorrelated with \(y_{i,t-1}\)

\[ 0.5(y_{i,2} - y_{i,1}) = 0.5\rho(y_{i,1}-y_{i,0}) + 0.5(u_{i,2}-u_{i,1}) \]

correlation arises between \(y_{i,1}\) and \(u_{i,1}\)

\[ y_{it} - \bar{y}_i = \rho (y_{i,t-1}-\bar{y}_{i,-1}) +( u_{it} - \bar{u}_i) \]

Nickell bias

\[ \begin{align} cov[\bar{y}_{i,-1},u_{it}] & = cov\left[ T^{-1} \sum_{s=1}^T y_{i,s-1}, u_{it} \right] \\ & = cov\left[ T^{-1} \sum_{s = j+1}^T y_{i,s-1}, u_{it} \right] \\ & \approx \frac{\sigma^2}{(1-\rho)T} \end{align} \]

\[ \hat{\rho} - \rho_0 = - \frac{cov[\bar{y}_{i,-1}, u_{it}]}{var[y_{i,t-1} - \bar{y}_{i,-1}]} \approx - \frac{\frac{\sigma^2}{(1-\rho)T}}{\frac{1}{(1-\rho^2)T}} = -\frac{1+\rho}{T} \]

Bias in random effects models

\[ \begin{align} y_{it} & = \alpha + \rho y_{i,t-1} + (v_i + u_{it}) \\ & = \alpha + \rho (\alpha + \rho y_{i,t-2} + v_i + u_{i,t-1})+ (v_i + u_{it}) \\ & = \cdots \\ \end{align} \]

makes it clear that \(y_{i,t-1}\) and \(w_{it} = v_i + u_{it}\) are correlated as

\[ cov[y_{i,t-1},w_{it}] = (\rho + \rho^2 + \cdots + \rho^{t})\sigma^2_v \]

First differencing

\[ \Delta y_{it} = \rho \Delta y_{i,t-1} + \Delta u_{it} \]

where \(\Delta y_{it} = y_{it} - y_{i,t-1}\) is the differenced version of \(y_{it}\). are Similarly defined are \(\Delta y_{i,t-1}\) and \(\Delta u_{it}\)

\[ \begin{align} cov[\Delta y_{i,t-1}, \Delta u_{it}] & =E[(y_{i,t-1} - y_{i,t-2}) (u_{it} - u_{i,t-1})] \\ & =E[ y_{i,t-1} u_{i,t-1} ] \\ & = \sigma^2 \end{align} \]

Solution

Unit root in panel data

\[ y_{it} = \alpha_i + \rho y_{i,t-1} + e_{it} \]

Levin, Lin, and Chu (2002)

\[ \Delta y_{it} = \alpha_i + \beta y_{i,t-1} + \sum_{k=1}^L \gamma_{ik} \Delta y_{i,t-k} + e_{it} \]

similar to that for the augmented Dicky-Fuller test

Im, Pesaran and Shin (2003)

\[ \Delta y_{it} = \alpha_i + \beta_i y_{i,t-1} + \sum_{k=1}^L \gamma_{ik} \Delta y_{i,t-k} + e_{it} \]

which allows heterogeneous \(\beta_i\) across individuals

R implementation

purtest(object, test = c("levinlin", "ips"),
exo = c("none", "intercept", "trend"),
lags = c("SIC", "AIC"), pmax = 10)

Summary