Zhentao Shi
Oct 6, 2021
\[ \begin{align} y_{1,t } & = y_{1,t-1} + u_{1,t} \\ y_{2,t } & = y_{2,t-1} + u_{2,t} \\ \end{align} \]
where \((u_{1,t},u_{2,t})\) are mutually independent and independent across time
n = 200
y1 <- cumsum(rnorm(n))
y2 <- cumsum(rnorm(n))
reg <- lm(y1 ~ y2)
summary( reg )$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.7582818 0.34242161 10.97560 3.369574e-22
## y2 0.5893765 0.02943782 20.02107 1.798968e-49
AR = function(b,T){
y = rep(0,T)
for (t in 1:T){
if (t > 1) {
y[t] = b * y[t - 1] + rnorm(1)
}
}
return(ts(y) )
}
spurious <- function(i, a, T){
y1 <- AR(a, T)
y2 <- AR(a, T)
reg <- lm(y1 ~ y2)
p.val <- summary(reg)[[4]][2,4]
# save the p-value of the estimate of y2's coefficient
return(p.val)
}
## [1] 0.066
## [1] 0.937
##
## Call:
## lm(formula = BTC ~ y)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15996 -8115 -5102 -975 55897
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 168.50 721.80 0.233 0.815
## y -270.22 16.51 -16.368 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15150 on 2678 degrees of freedom
## Multiple R-squared: 0.09094, Adjusted R-squared: 0.09061
## F-statistic: 267.9 on 1 and 2678 DF, p-value: < 2.2e-16
Two time series \(\{y_{1,t}\}\) and \(\{y_{2,t}\}\) are both I(1)
In general, \((y_{1,t} - \beta y_{2,t})\) is still I(1) for an arbitrary constant \(\beta\)
Remind I(0) means the time series is weakly dependent. If there happens to be a constant \(\beta\) such that \((y_{1,t} - \beta y_{2,t})\) is I(0), we say these two time series are cointegrated. This relationship is called cointegration.
In the previous example, two time series are independent. No linear combination can reduce them to a stationary time series
## [1] "USD/JPY"
## [1] "HKD/JPY"
Start from the cointegration equation \(y_{1,t} = \beta y_{2,t} + e_t\)
Subtract \(y_{1,t-1}\) on both sides
\[ \begin{align} \Delta y_{1,t} & = \beta y_{2,t} - y_{1,t-1} + e_t \\ & = \beta \Delta y_{2,t} + \beta y_{2,t-1} - y_{1,t-1} + e_t \\ & = \beta \Delta y_{2,t} - (y_{1,t-1} - \beta y_{2,t-1}) + e_t \\ & = \beta \Delta y_{2,t} - e_{t-1} + e_t \end{align} \]
\[e_t = \theta_1 e_{t-1} + w_t\]
where \(w_t\) and \(e_{t-1}\) are uncorrelated. Moreover, decompose
\[\Delta y_{2,t} = \theta_2 + \theta_3 e_{t-1} + \eta_t\]
where \(\eta_t\) and \(e_{t-1}\) are uncorrelated.
\[ \begin{align} \Delta y_{1,t} & = \beta (\theta_2 + \theta_3 e_{t-1} + \eta_t) - e_{t-1} + (\theta_1 e_{t-1} + w_t) \\ & = \beta \theta_2 + (\beta \theta_3 + \theta_1 - 1) e_{t-1} + (\beta \eta_t + w_t) \\ & = \mu + \alpha (y_{1,t-1} - \beta y_{2,t-1}) + v_t \end{align} \] where \(v_t = \beta \eta_t + w_t\) is orthogonal to \(e_{t-1}\) by construction
The last equation is called the error correction model (ECM)
\(y_{1,t} = \beta y_{2,t} + e_t\) is the long-run relationship
\(\alpha\) signifies the direction and the magnitude of \(y_{1,t}\) goes back to the long-run level in response to \(e_{t-1}\), a departure from the long-run relationship in the previous period
\[ y_{1,t} = \beta y_{2,t} + e_t, \]
the reverse regression of \(y_{2,t}\) on \(y_{1,t}\) is inconsistent for the coefficient \(1/\beta\).
In cointegration, the reverse regression is consistent for \(1/\beta\).
It implies that the cointegration regression does not speak causal relationship
Both variables are viewed as endogenous in general
Similarly to the ECM for \(\Delta y_{1,t}\), we can derive another ECM of \(\Delta y_{2,t}\) on \((y_{1,t-1} - \beta y_{2,t-1})\) to characterize the dynamics of the equilibrium shock.
Stacking multiple ECMs makes a vector error correction model (VECM)
As a nutshell, for a VECM
\[ \begin{align} \Delta y_{1,t} & = \mu_1 + \alpha_1 e_{t-1} + v_{1,t} \\ \Delta y_{2,t} & = \mu_2 + \alpha_2 e_{t-1} + v_{2,t} \end{align} \] where \(e_{t-1} = y_{1,t-1} - \beta y_{2,t-1}\).
\[ \begin{align} \Delta \mathbf{y}_t & = \boldsymbol{\mu} + \boldsymbol{\alpha} \cdot e_{t-1} + \mathbf{v}_{t} \\ & = \boldsymbol{\mu} + \boldsymbol{\alpha} \cdot \boldsymbol{\beta}' \mathbf{y}_{t-1} + \mathbf{v}_{t} \end{align} \]
We call \(\boldsymbol{\beta}\) the cointegration vector
One of the elements in \(\boldsymbol{\beta}\) should be normalized as 1; otherwise cannot identify \(\boldsymbol{\alpha}\).
More deterministic component can be included in the cointegration
Mean shift and time trend in \(\boldsymbol{\beta}' \mathbf{y}_{t} - \beta_1 - \beta_2 t = e_t\)
Other I(0) variables can be added into the RHS to capture short-run dynamics. For example:
A more general formula for VECM is
\[ \begin{align} \Delta \mathbf{y}_t & = & \boldsymbol{\mu} + \boldsymbol{\alpha} ( \boldsymbol{\beta}' \mathbf{y}_{t-1} - \beta_1 - \beta_2 (t-1)) \\ & & + \sum_{j=1}^{p-1} \Gamma_j (\Delta \mathbf{y}_{t-j}) + \sum_{j=0}^q \Phi_j \mathbf{x}_{t-j} + \mathbf{v}_{t} \end{align} \]
where \(\Gamma_j\) and \(\Phi_j\) are \(K\times K\) coefficient matrices
At its face value, cointegration looks similar VAR
A cointegration system imposes restriction on the VAR coefficient matrix. It is a special form of nonstationary VAR.
For simplicity, consider the VAR system with no intercept
\[ \begin{align} \mathbf{y}_t & = \mathbf {D} \mathbf{y}_{t-1} + \mathbf{v}_{t} \end{align} \]
If \(\mathbf{y}_t \sim I(1)\), there are unit roots in the coefficient matrix \(\mathbf{D}\)
\[ \begin{align} \Delta \mathbf{y}_t & = \mathbf {A} \mathbf{y}_{t-1} + \mathbf{v}_{t} \end{align} \] where \(\mathbf {A} = \mathbf {D} - \mathbf {I}_K\)
Notice on the LHS \(\Delta \mathbf{y}_t \sim I(0)\), but on the RHS \(\mathbf{y}_{t-1} \sim I(1)\).
There is potential mismatch between the two sides of very different behaviors.
What condition on \(\mathbf{A}\) keeps the equality to hold?
The only way to keep the two sides balanced is When \(\mathbf{A}\) is rank deficient.
When \(\mathbf{A}\) is rank deficient, any \(K\times 1\) non-zero vector in the null space of \(\mathbf{A}\) balances the two sides. Let’s call such a vector \(\boldsymbol{\gamma}\):
\[ \begin{align} \boldsymbol{\gamma}' \Delta \mathbf{y}_t & = \boldsymbol{\gamma}' \mathbf {A} \mathbf{y}_{t-1} + \boldsymbol{\gamma} '\mathbf{v}_{t} = \boldsymbol{\gamma} '\mathbf{v}_{t}. \end{align} \]
\[ \begin{pmatrix} \Delta y_{1,t} \\ \Delta y_{2,t} \end{pmatrix} = \begin{pmatrix} \alpha_1 \\ \alpha_2 \end{pmatrix} e_{1,t-1} + \begin{pmatrix} v_{1,t} \\ v_{2,t} \end{pmatrix} = \begin{pmatrix} \alpha_1 \\ \alpha_2 \end{pmatrix} \begin{pmatrix} 1, \beta \\ \end{pmatrix} \begin{pmatrix} y_{1,t-1} \\ y_{2,t-1} \end{pmatrix} + \begin{pmatrix} v_{1,t} \\ v_{2,t} \end{pmatrix} \]
In this case, the \(2 \times 2\) coefficient matrix is of rank 1
\[ \begin{pmatrix} \Delta y_{1,t} \\ \Delta y_{2,t} \end{pmatrix} = \begin{pmatrix} 0,0 \\ 0,0 \end{pmatrix} \begin{pmatrix} y_{1,t-1} \\ y_{2,t-1} \end{pmatrix} + \begin{pmatrix} v_{1,t} \\ v_{2,t} \end{pmatrix} = \begin{pmatrix} v_{1,t} \\ v_{2,t} \end{pmatrix} \]
In a general \(K\)-equation VECM system, \(\mathrm{rank}(\mathbf{A}) < K\)
Suppose that the same coefficient, without loss of generality say \(y_{1,t-1}\), is normalized as 1. Then there are at most \(K-1\) different cointegration vectors
Back to the level data. Cointegration requires that \(\mathrm{rank}(\mathbf{D} - \mathbf{I}_K ) < K\)
\[ \begin{align} y_{1,t} & = \mu + \beta y_{2,t} + e_t \\ \Delta y_{2,t} & = u_{2,t} \end{align} \]
Phillips and Hansen (1990) modify the OLS estimator to explicitly accounts the correlation and autocorrelation in the estimation step
In similar spirit to the Phillps-Perron test, it deals with serially correlation in a nonparametrically manner, instead of trying to correctly specify the dynamics
\[ \hat{\beta}_{FM} = \begin{pmatrix} T & \sum y_{2,t} \\ \sum y_{2,t} & \sum y_{2,t}^2 \end{pmatrix}^{-1} \begin{pmatrix} \sum y_{1,t}^+ \\ \sum( y_{1, t}^+ y_{2,t} - \hat{c}) \end{pmatrix} \]
Some technical details (optional)
The convergence speed is \(T\) (super-consistency)
The t-statistic asymptotically follows the usual \(N(0,1)\)
The R package cointReg
is not recommended
\[ \begin{align} \begin{pmatrix} \Delta y_{1,t} \\ \Delta y_{2,t} \end{pmatrix} & = & \begin{pmatrix} \mu_1 \\ \mu_2 \end{pmatrix} + \begin{pmatrix} \alpha_1 \\ \alpha_2 \end{pmatrix} (y_{1,t-1} - \beta_1 - \beta_2 y_{2,t-1}) \\ & & + \begin{pmatrix} \gamma_{11} & \gamma_{12} \\ \gamma_{21} & \gamma_{22} \end{pmatrix} \begin{pmatrix} \Delta y_{1,t-1} \\ \Delta y_{2,t-1} \end{pmatrix} + \begin{pmatrix} v_{1,t} \\ v_{2,t} \end{pmatrix} \end{align} \]
Soren Johansen (1988, 1991, 1995)
Endogeneity in the level equation is solved by transforming it into a VECM
Serial correlation in the errors is handled by the lagged differenced terms which explicitly account for the potential short-run dynamics
Steps (optional)
Need to choose the number of lagged diff variables. Information criteria are helpful.
Both FM-OLS and the reduced-rank estimator must specify the number of cointegration relationship before estimation
Attention!
First estimate the single cointegration vector \(\hat{\beta}\), and then plug it into VECM.
To be detailed in the next revamp.
Based on OLS residual
Main idea: After running OLS, check if the residual is I(1)
Null hypothesis: residual is I(1) (no cointegration)
Alternative hypothesis: residual is I(0) (cointegration is present)
Test statistic: ADF
Critical value must reflect the existence of the regressors
MacKinnon (1996) tabulates the critical values
Johansen (1995)
Sequential tests: Let \(K\) be the number of VECM equations
Test statistic
\[ J = -(T-p) \sum_{j= r + 1}^K \log(1-\hat{\lambda}_j) = (T-p) \sum_{j= r + 1}^K \log\left (\frac{1}{1-\hat{\lambda}_j}\right) \]
where \(r\) is the number of cointegration rank under the null, \(p\) is the order of lagged diff terms, \((\hat{\lambda}_j)_{j=1}^K\) is the generalized eigenvalues (from highest to lowest).
A by-product of the MLE estimation
Main idea:
lrm1
: Logarithm of real moneylny
: Logarithm of real incomelnmr
: Interest ratedifp
: Inflation ratelibrary(urca)
data(finland)
fl <- finland
fl.vecm <- ca.jo(fl, ecdet = "none", spec = "transitory", K = 2)
summary(fl.vecm)
##
## ######################
## # Johansen-Procedure #
## ######################
##
## Test type: maximal eigenvalue statistic (lambda max) , with linear trend
##
## Eigenvalues (lambda):
## [1] 0.31890665 0.24501279 0.07213939 0.02140750
##
## Values of teststatistic and critical values of test:
##
## test 10pct 5pct 1pct
## r <= 3 | 2.25 6.50 8.18 11.65
## r <= 2 | 7.79 12.91 14.90 19.19
## r <= 1 | 29.23 18.90 21.07 25.75
## r = 0 | 39.94 24.78 27.14 32.14
##
## Eigenvectors, normalised to first column:
## (These are the cointegration relations)
##
## lrm1.l1 lny.l1 lnmr.l1 difp.l1
## lrm1.l1 1.000000 1.0000000 1.0000000 1.000000
## lny.l1 -1.117163 -1.6206016 -0.9074816 1.507580
## lnmr.l1 -4.682914 0.6434857 0.3116962 -1.535948
## difp.l1 5.467442 38.3345426 -2.0157542 -8.090441
##
## Weights W:
## (This is the loading matrix)
##
## lrm1.l1 lny.l1 lnmr.l1 difp.l1
## lrm1.d 0.056932283 -0.023040222 -0.1421986421 -0.0045795123
## lny.d 0.062776880 0.001260473 0.0008129485 -0.0080064088
## lnmr.d 0.106594339 0.004316392 -0.0134648230 0.0020914177
## difp.d -0.002854965 -0.013111002 0.0172870125 0.0009611353
SAS
routineecdet
: deterministic component in the cointegrationK
: at least 2 (for one lag in the level). Historical relic, as in Johansen’s regression with lagged differenced variables