Spurious regression

Two nonstationary time series

\[ \begin{align} y_{1,t } & = y_{1,t-1} + u_{1,t} \\ y_{2,t } & = y_{2,t-1} + u_{2,t} \\ \end{align} \]

where \((u_{1,t},u_{2,t})\) are mutually independent and independent across time

Thus their cumulative sums \(\{y_{1,t}\}\) and \(\{y_{2,t}\}\) are statistically independently as well
Independence implies \(E[y_{1,t}|y_{2,t}] = 0\) for each \(t\), that is, the true causal regression coefficient is 0
We expect small \(\hat{\beta}\) and insignificant \(t\)-statistic

Demonstrative example

But the surprising truth is…

n = 200
y1 <- cumsum(rnorm(n))
y2 <- cumsum(rnorm(n))

reg <- lm(y1 ~ y2)
summary( reg )$coefficients

##              Estimate Std. Error  t value     Pr(>|t|)
## (Intercept) 3.7582818 0.34242161 10.97560 3.369574e-22
## y2          0.5893765 0.02943782 20.02107 1.798968e-49

plot( x = y2, y = y1, type = "l")

Simulation example

A comprehensive simulation example summarizes the repeated experiments

AR = function(b,T){
    y = rep(0,T)
    for (t in 1:T){
        if (t > 1) {
            y[t] = b * y[t - 1] + rnorm(1)
        }
    }
    return(ts(y) )
}

spurious <- function(i, a, T){
  y1 <- AR(a, T)
  y2 <- AR(a, T)
  
  reg <- lm(y1 ~ y2)
  p.val <- summary(reg)[[4]][2,4]
  # save the p-value of the estimate of y2's coefficient
  return(p.val)
}

Stationary case: outcome close to 5%

out <- ldply(.data = 1:1000, .fun = spurious, a = 0.3 , T = 1000)
print( mean(out < 0.05) )

## [1] 0.066

Nonstationary case: outcome far bigger than 5%

out <- ldply(.data = 1:1000, .fun = spurious, a = 1.0 , T = 1000)
print( mean(out < 0.05) )

## [1] 0.937

Newbold and Granger (1974) discover this phenomenon and name it spurious regression
Phillips (1987) derives the theory

Real data example

Bitcoin is regressed on an independent, artificially generated random walk

set.seed(2021-9-28)

BTC <- quantmod::getSymbols("BTC-USD",auto.assign = FALSE)[,4]
plot(BTC)

N <- length(BTC)
y <- cumsum(rnorm(N))
plot(y, type = "l")

lm(BTC ~ y) %>% summary %>% print( )

## 
## Call:
## lm(formula = BTC ~ y)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -15996  -8115  -5102   -975  55897 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   168.50     721.80   0.233    0.815    
## y            -270.22      16.51 -16.368   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15150 on 2678 degrees of freedom
## Multiple R-squared:  0.09094,    Adjusted R-squared:  0.09061 
## F-statistic: 267.9 on 1 and 2678 DF,  p-value: < 2.2e-16

Consequence

Asymptotically, for cross sectional data
- \(t\)-statistic follows standard normal
- \(R^2\) converges to a constant in \([0,1]\).
In spurious regression, as \(T \to \infty\)
- \(t\)-statistic diverges to \(\infty\)
- \(R^2\) converges to a stable distribution in \([0,1]\), not a constant
The drastic contrasts suggest that the standard asymptotic theory fails in the nonstationary environment of bivariate regressions

Cointegration

Two time series \(\{y_{1,t}\}\) and \(\{y_{2,t}\}\) are both I(1)
In general, \((y_{1,t} - \beta y_{2,t})\) is still I(1) for an arbitrary constant \(\beta\)
Remind I(0) means the time series is weakly dependent. If there happens to be a constant \(\beta\) such that \((y_{1,t} - \beta y_{2,t})\) is I(0), we say these two time series are cointegrated. This relationship is called cointegration.
In the previous example, two time series are independent. No linear combination can reduce them to a stationary time series

Example

Witness the long-run equilibrium relationship

TT = 200
e1 <- rnorm(TT) 
y1 <- cumsum(e1) %>% ts()

y2 <- 0.5 * y1 + rnorm(TT)

matplot(cbind(y1, y2), type = "l")

matplot(cbind(0.5 * y1, y2), type = "l")

Equilibrium dynamics

plot( x = y1, y = y2, type = "l")
abline(a = 0, b = 0.5, col = "red")

Real example 1: exchange rates

library(quantmod)
getFX("USD/JPY")

## [1] "USD/JPY"

getFX("HKD/JPY")

## [1] "HKD/JPY"

matplot( y = cbind(USDJPY, HKDJPY*7.8), x = index(USDJPY),
         type = "l", xlab = "time"  )

plot( y = as.vector(USDJPY), x = as.vector(HKDJPY), type = "l")

Pairs trading

Market neutral strategy
Statistical arbitrage
Be aware of structural change
Multiple testing amongst many pairs

Real example 2: bitcoin-ethereum

BTC <- quantmod::getSymbols("BTC-USD",auto.assign = FALSE, from = "2021-01-01")[,4]
plot(BTC)

ETH <- quantmod::getSymbols("ETH-USD",auto.assign = FALSE, from = "2021-01-01")[,4]
plot(ETH)

plot( x = as.vector(ETH), y = as.vector(BTC), type = "l")

Long-run and short-run

Start from the cointegration equation \(y_{1,t} = \beta y_{2,t} + e_t\)
Subtract \(y_{1,t-1}\) on both sides

\[ \begin{align} \Delta y_{1,t} & = \beta y_{2,t} - y_{1,t-1} + e_t \\ & = \beta \Delta y_{2,t} + \beta y_{2,t-1} - y_{1,t-1} + e_t \\ & = \beta \Delta y_{2,t} - (y_{1,t-1} - \beta y_{2,t-1}) + e_t \\ & = \beta \Delta y_{2,t} - e_{t-1} + e_t \end{align} \]

When \(e_{t-1}\) and \(e_t\) are serially correlated, decompose

\[e_t = \theta_1 e_{t-1} + w_t\]

where \(w_t\) and \(e_{t-1}\) are uncorrelated. Moreover, decompose

\[\Delta y_{2,t} = \theta_2 + \theta_3 e_{t-1} + \eta_t\]

where \(\eta_t\) and \(e_{t-1}\) are uncorrelated.

Substitute the two decomposition equations back:

\[ \begin{align} \Delta y_{1,t} & = \beta (\theta_2 + \theta_3 e_{t-1} + \eta_t) - e_{t-1} + (\theta_1 e_{t-1} + w_t) \\ & = \beta \theta_2 + (\beta \theta_3 + \theta_1 - 1) e_{t-1} + (\beta \eta_t + w_t) \\ & = \mu + \alpha (y_{1,t-1} - \beta y_{2,t-1}) + v_t \end{align} \] where \(v_t = \beta \eta_t + w_t\) is orthogonal to \(e_{t-1}\) by construction

The last equation is called the error correction model (ECM)
\(y_{1,t} = \beta y_{2,t} + e_t\) is the long-run relationship
\(\alpha\) signifies the direction and the magnitude of \(y_{1,t}\) goes back to the long-run level in response to \(e_{t-1}\), a departure from the long-run relationship in the previous period

Endogeneity

In cross-sectional regression, if \(y_{2,t}\) is uncorrelated with \(e_t\) in the regression

\[ y_{1,t} = \beta y_{2,t} + e_t, \]

the reverse regression of \(y_{2,t}\) on \(y_{1,t}\) is inconsistent for the coefficient \(1/\beta\).

In cointegration, the reverse regression is consistent for \(1/\beta\).
It implies that the cointegration regression does not speak causal relationship
Both variables are viewed as endogenous in general
Similarly to the ECM for \(\Delta y_{1,t}\), we can derive another ECM of \(\Delta y_{2,t}\) on \((y_{1,t-1} - \beta y_{2,t-1})\) to characterize the dynamics of the equilibrium shock.

VECM

Stacking multiple ECMs makes a vector error correction model (VECM)
As a nutshell, for a VECM
- LHS is a first difference of I(1), which is I(0)
- RHS must include the lag of the cointegration error

\[ \begin{align} \Delta y_{1,t} & = \mu_1 + \alpha_1 e_{t-1} + v_{1,t} \\ \Delta y_{2,t} & = \mu_2 + \alpha_2 e_{t-1} + v_{2,t} \end{align} \] where \(e_{t-1} = y_{1,t-1} - \beta y_{2,t-1}\).

More generally, if there are \(K\) endogenous variables \(\mathbf{y}_t = ( y_{1,t}, \ldots, y_{K,t})'\), we can write the \(K\)-equation system as

\[ \begin{align} \Delta \mathbf{y}_t & = \boldsymbol{\mu} + \boldsymbol{\alpha} \cdot e_{t-1} + \mathbf{v}_{t} \\ & = \boldsymbol{\mu} + \boldsymbol{\alpha} \cdot \boldsymbol{\beta}' \mathbf{y}_{t-1} + \mathbf{v}_{t} \end{align} \]

We call \(\boldsymbol{\beta}\) the cointegration vector
One of the elements in \(\boldsymbol{\beta}\) should be normalized as 1; otherwise cannot identify \(\boldsymbol{\alpha}\).

Additional features

More deterministic component can be included in the cointegration
- Mean shift and time trend in \(\boldsymbol{\beta}' \mathbf{y}_{t} - \beta_1 - \beta_2 t = e_t\)
- Other I(0) variables can be added into the RHS to capture short-run dynamics. For example:
  - Explanatory variables other than \(y_{1,t}\) and \(y_{2,t}\)
  - Lags of \(\Delta y_{1,t}\) and \(\Delta y_{2,t}\)
A more general formula for VECM is

\[ \begin{align} \Delta \mathbf{y}_t & = & \boldsymbol{\mu} + \boldsymbol{\alpha} ( \boldsymbol{\beta}' \mathbf{y}_{t-1} - \beta_1 - \beta_2 (t-1)) \\ & & + \sum_{j=1}^{p-1} \Gamma_j (\Delta \mathbf{y}_{t-j}) + \sum_{j=0}^q \Phi_j \mathbf{x}_{t-j} + \mathbf{v}_{t} \end{align} \]

where \(\Gamma_j\) and \(\Phi_j\) are \(K\times K\) coefficient matrices

Cointegration versus VAR

At its face value, cointegration looks similar VAR
A cointegration system imposes restriction on the VAR coefficient matrix. It is a special form of nonstationary VAR.
For simplicity, consider the VAR system with no intercept

\[ \begin{align} \mathbf{y}_t & = \mathbf {D} \mathbf{y}_{t-1} + \mathbf{v}_{t} \end{align} \]

If \(\mathbf{y}_t \sim I(1)\), there are unit roots in the coefficient matrix \(\mathbf{D}\)

Subtract \(\mathbf{y}_{t-1}\) on both sides:

\[ \begin{align} \Delta \mathbf{y}_t & = \mathbf {A} \mathbf{y}_{t-1} + \mathbf{v}_{t} \end{align} \] where \(\mathbf {A} = \mathbf {D} - \mathbf {I}_K\)

Notice on the LHS \(\Delta \mathbf{y}_t \sim I(0)\), but on the RHS \(\mathbf{y}_{t-1} \sim I(1)\).
There is potential mismatch between the two sides of very different behaviors.
What condition on \(\mathbf{A}\) keeps the equality to hold?

Reduced rank

The only way to keep the two sides balanced is When \(\mathbf{A}\) is rank deficient.
When \(\mathbf{A}\) is rank deficient, any \(K\times 1\) non-zero vector in the null space of \(\mathbf{A}\) balances the two sides. Let’s call such a vector \(\boldsymbol{\gamma}\):

\[ \begin{align} \boldsymbol{\gamma}' \Delta \mathbf{y}_t & = \boldsymbol{\gamma}' \mathbf {A} \mathbf{y}_{t-1} + \boldsymbol{\gamma} '\mathbf{v}_{t} = \boldsymbol{\gamma} '\mathbf{v}_{t}. \end{align} \]

In the bivariate system, 4 coefficients in \(\mathbf{A}\) is reduced to at most 3 parameters

\[ \begin{pmatrix} \Delta y_{1,t} \\ \Delta y_{2,t} \end{pmatrix} = \begin{pmatrix} \alpha_1 \\ \alpha_2 \end{pmatrix} e_{1,t-1} + \begin{pmatrix} v_{1,t} \\ v_{2,t} \end{pmatrix} = \begin{pmatrix} \alpha_1 \\ \alpha_2 \end{pmatrix} \begin{pmatrix} 1, \beta \\ \end{pmatrix} \begin{pmatrix} y_{1,t-1} \\ y_{2,t-1} \end{pmatrix} + \begin{pmatrix} v_{1,t} \\ v_{2,t} \end{pmatrix} \]

In this case, the \(2 \times 2\) coefficient matrix is of rank 1

Another possibility is of rank 0

\[ \begin{pmatrix} \Delta y_{1,t} \\ \Delta y_{2,t} \end{pmatrix} = \begin{pmatrix} 0,0 \\ 0,0 \end{pmatrix} \begin{pmatrix} y_{1,t-1} \\ y_{2,t-1} \end{pmatrix} + \begin{pmatrix} v_{1,t} \\ v_{2,t} \end{pmatrix} = \begin{pmatrix} v_{1,t} \\ v_{2,t} \end{pmatrix} \]

Reduced rank for \(K\)-equation system

In a general \(K\)-equation VECM system, \(\mathrm{rank}(\mathbf{A}) < K\)
Suppose that the same coefficient, without loss of generality say \(y_{1,t-1}\), is normalized as 1. Then there are at most \(K-1\) different cointegration vectors
Back to the level data. Cointegration requires that \(\mathrm{rank}(\mathbf{D} - \mathbf{I}_K ) < K\)

Estimation

Consider the simple model

\[ \begin{align} y_{1,t} & = \mu + \beta y_{2,t} + e_t \\ \Delta y_{2,t} & = u_{2,t} \end{align} \]

Reasons that OLS faces difficulties
- \(e_t\) and \(u_{2,t}\) are correlated
- the vector \((e_t, u_{2,t})\) is serially correlated
Consistency OK
Asymptotic distribution is invalid without proper modification

Fully modified OLS

Phillips and Hansen (1990) modify the OLS estimator to explicitly accounts the correlation and autocorrelation in the estimation step
In similar spirit to the Phillps-Perron test, it deals with serially correlation in a nonparametrically manner, instead of trying to correctly specify the dynamics

\[ \hat{\beta}_{FM} = \begin{pmatrix} T & \sum y_{2,t} \\ \sum y_{2,t} & \sum y_{2,t}^2 \end{pmatrix}^{-1} \begin{pmatrix} \sum y_{1,t}^+ \\ \sum( y_{1, t}^+ y_{2,t} - \hat{c}) \end{pmatrix} \]

Some technical details (optional)
- \(y_{1,t}^+\) is a modification of \(y_{1,t}\) to get rid of the endogeneity (the definition involves the long-run variance of \((e_t, u_{2,t} )\) )
- \(y_{1, t}^+ y_{2,t} - \hat{c}\) its a modification of \(y_{1, t} y_{2,t}\) to get rid of the serial correlation (involves estimation of one-sided long-run variance)
- Need to use the number of lags in the estimation of the long-run variance
The convergence speed is \(T\) (super-consistency)
The t-statistic asymptotically follows the usual \(N(0,1)\)
The R package cointReg is not recommended

Reduced Rank Regression

Consider a simply model with intercept

\[ \begin{align} \begin{pmatrix} \Delta y_{1,t} \\ \Delta y_{2,t} \end{pmatrix} & = & \begin{pmatrix} \mu_1 \\ \mu_2 \end{pmatrix} + \begin{pmatrix} \alpha_1 \\ \alpha_2 \end{pmatrix} (y_{1,t-1} - \beta_1 - \beta_2 y_{2,t-1}) \\ & & + \begin{pmatrix} \gamma_{11} & \gamma_{12} \\ \gamma_{21} & \gamma_{22} \end{pmatrix} \begin{pmatrix} \Delta y_{1,t-1} \\ \Delta y_{2,t-1} \end{pmatrix} + \begin{pmatrix} v_{1,t} \\ v_{2,t} \end{pmatrix} \end{align} \]

Soren Johansen (1988, 1991, 1995)
Endogeneity in the level equation is solved by transforming it into a VECM
Serial correlation in the errors is handled by the lagged differenced terms which explicitly account for the potential short-run dynamics
Steps (optional)
1. Use each of \(\Delta y_{1,t}\), \(\Delta y_{2,t}\), \(y_{1,t}\) and \(y_{2,t}\) as a dependent variable to be regressed on \((1, \Delta y_{1,t-1}, \Delta y_{2,t-1})\) (a filtering step)
2. Write \(\{\mathrm{res}(\Delta y_{1,t}), \mathrm{res}(\Delta y_{2,t})\}_{t=1}^T\) into a \(T\times 2\) matrix \(U_0\), and similarly \(\{\mathrm{res}(y_{1,t}), \mathrm{res}(y_{2,t})\}_{t=1}^T\) into a \(T\times 2\) matrix \(U_1\) Generalized eigenvalue decomposition for \(U_1' U_0 (U_0' U_0)^{-1} U_0' U_1\) subject to \(\beta' (T^{-1} U_1 U_1' ) \beta = 1\). Find the eigenvectors associated with the biggest \(r\) eigenvalues as \((\hat{\beta}_1, \hat{\beta}_2)\). (In the bivariate system, find the eigenvector for the larger eigenvalue.)
3. Given \((\hat{\beta}_1, \hat{\beta}_2)\), we can construct \(y_{1,t-1} - \hat{\beta}_1 - \hat{\beta}_2 y_{2,t-1}\) and use it as a regressor to estimate other coefficients by maximum likelihood of joint normal error (in order to take care of the correlation between \(v_{1,t}\) and \(v_{2,t}\)).
Need to choose the number of lagged diff variables. Information criteria are helpful.
Both FM-OLS and the reduced-rank estimator must specify the number of cointegration relationship before estimation

Engle-Granger estimation

Attention!

First estimate the single cointegration vector \(\hat{\beta}\), and then plug it into VECM.
To be detailed in the next revamp.

Engle-Granger test

Based on OLS residual
Main idea: After running OLS, check if the residual is I(1)
Null hypothesis: residual is I(1) (no cointegration)
Alternative hypothesis: residual is I(0) (cointegration is present)
Test statistic: ADF
Critical value must reflect the existence of the regressors
MacKinnon (1996) tabulates the critical values

Reduced-rank-regression test

Johansen (1995)
Sequential tests: Let \(K\) be the number of VECM equations
- Step 1: H0: regression rank is 0; versus H1: regression rank > 0
- Step 2: H0: regression rank is 1; versus H1: regression rank > 1
- …
- Step \(K\): H0: regression rank is \(K-1\); versus H1: regression rank = \(K\) (all time series are I(0))
Test statistic

\[ J = -(T-p) \sum_{j= r + 1}^K \log(1-\hat{\lambda}_j) = (T-p) \sum_{j= r + 1}^K \log\left (\frac{1}{1-\hat{\lambda}_j}\right) \]

where \(r\) is the number of cointegration rank under the null, \(p\) is the order of lagged diff terms, \((\hat{\lambda}_j)_{j=1}^K\) is the generalized eigenvalues (from highest to lowest).

A by-product of the MLE estimation
Main idea:
- When the null is true, \(\log(1/(1-\hat{\lambda}_j)) \approx \log(1) = 0\) for all \(j\in \{r+1, \ldots,K\}\)
- When the null is false, there exists some \(j\in \{r+1, \ldots,K\}\) such that \(\log(1/(1-\hat{\lambda}_j)) > 0\). Given the multiplier \((T-p)\), the test statistic diverges to infinity in the limit

Parameter testing

Cointegration vector (long-run relationship): FM-OLS \(t\)-statistic
VECM model. As all components are I(0), \(t\)-statistics asymptotically follows \(N(0,1)\)
- coefficient for the equilibrium residual
- short-run dynamics

Example

Finland quarterly historical data 1958:Q2–1984:Q3
- lrm1: Logarithm of real money
- lny: Logarithm of real income
- lnmr: Interest rate
- difp: Inflation rate

library(urca)
data(finland)
fl <- finland
fl.vecm <- ca.jo(fl, ecdet = "none", spec = "transitory", K = 2)
summary(fl.vecm)

## 
## ###################### 
## # Johansen-Procedure # 
## ###################### 
## 
## Test type: maximal eigenvalue statistic (lambda max) , with linear trend 
## 
## Eigenvalues (lambda):
## [1] 0.31890665 0.24501279 0.07213939 0.02140750
## 
## Values of teststatistic and critical values of test:
## 
##           test 10pct  5pct  1pct
## r <= 3 |  2.25  6.50  8.18 11.65
## r <= 2 |  7.79 12.91 14.90 19.19
## r <= 1 | 29.23 18.90 21.07 25.75
## r = 0  | 39.94 24.78 27.14 32.14
## 
## Eigenvectors, normalised to first column:
## (These are the cointegration relations)
## 
##           lrm1.l1     lny.l1    lnmr.l1   difp.l1
## lrm1.l1  1.000000  1.0000000  1.0000000  1.000000
## lny.l1  -1.117163 -1.6206016 -0.9074816  1.507580
## lnmr.l1 -4.682914  0.6434857  0.3116962 -1.535948
## difp.l1  5.467442 38.3345426 -2.0157542 -8.090441
## 
## Weights W:
## (This is the loading matrix)
## 
##             lrm1.l1       lny.l1       lnmr.l1       difp.l1
## lrm1.d  0.056932283 -0.023040222 -0.1421986421 -0.0045795123
## lny.d   0.062776880  0.001260473  0.0008129485 -0.0080064088
## lnmr.d  0.106594339  0.004316392 -0.0134648230  0.0020914177
## difp.d -0.002854965 -0.013111002  0.0172870125  0.0009611353

An adaption of an SAS routine
ecdet: deterministic component in the cointegration
K: at least 2 (for one lag in the level). Historical relic, as in Johansen’s regression with lagged differenced variables
Decompose \(\mathbf{A} = \boldsymbol{\alpha} \boldsymbol{\beta}'\), where \(\boldsymbol{\alpha}\) is the loading matrix

Cointegration

Spurious regression

Demonstrative example

Simulation example

Real data example

Consequence

Cointegration

Example

Equilibrium dynamics

Real example 1: exchange rates

Pairs trading

Real example 2: bitcoin-ethereum

Long-run and short-run

Endogeneity

VECM

Additional features

Cointegration versus VAR

Reduced rank

Reduced rank for \(K\)-equation system

Estimation

Fully modified OLS

Reduced Rank Regression

Engle-Granger estimation

Engle-Granger test

Reduced-rank-regression test

Parameter testing

Example

Summary