Properties

For simplicity, consider a time series \((y_1,y_2,\ldots, y_T)\) with the initial value \(y_0 = 0\).
- Long-lasting shocks: \(y_t = \epsilon_1 + \epsilon_2 + \ldots + \epsilon_t\)
- Mean: \(E[y_t] = 0\)
- Variance \(var[y_t] = t \sigma^2\)
- covariance \(cov[y_t, y_s] = \min(t,s) \cdot \sigma^2\)
- Best mean prediction: \(E_t[y_{t+h}] = E_t[ y_t + \epsilon_{t+1} + \ldots + \epsilon_{t+h}] = y_t +E_t[ \epsilon_{t+1} + \ldots + \epsilon_{t+h}] =y_t\) for any \(h>0\)

Compare with stationary AR(1)

Stationarity requires \(|\beta| < 1\) in \(y_{t} = \beta y_{t-1} + \epsilon_t\).
Repeated substitution

\[ \begin{align} y_{t+h} & = \beta ( \beta y_{t+h-2} + \epsilon_{t+h-1}) + \epsilon_{t+h} \\ & = \cdots \\ & = \beta^h y_{t} + \sum_{q=0}^{h-1} \beta^{q} \epsilon_{t+h-q} \end{align} \]

Properties
- The best mean prediction \(E_t[y_{t+h}] = \beta^h y_t\) for \(h>0\)
- Mean reversion in the long-run: \(E_t[y_{t+h}] \to 0\) as \(h \to \infty\)
- Diminishing shocks: \(y_t = \sum_{q=0}^{t-1} \beta^{q} \epsilon_{t-q}\)

Consequences of unit root

LLN doesn’t apply: \(\frac{1}{T} \sum_{t=1}^T y_t \stackrel{p}{\nrightarrow} 0 = E [ \frac{1}{T} \sum_{t=1}^T y_t ]\) as \(T \to \infty\).
CLT does not apply: the variance \(var[y_t] = t\sigma^2\) diverges to explosion as \(t\to\infty\)
Caution must be exercised when using such time series for statistical inference. Standard inference procedure are invalid!

Integrated time series

Weakly dependent time series is called integrated of order 0, or I(0)
Integrated of order one, or I(1), means that the first difference \(\Delta y_t = y_t - y_{t-1}\) is a weakly dependent time series
Integrated of order two, or I(2), means that the difference of the first difference \[ \begin{align} \Delta^2 y_t & = \Delta y_t - \Delta y_{t-1} \\ & = (y_t - y_{t-1} ) - (y_{t-1} - y_{t-2} ) \\ & = y_t - 2y_{t-1} + y_{t-2} \end{align} \] is weakly dependent
The definition can be further extend to higher-order integrations.
In real financial and economic applications, we rarely witness time series of integration order higher than 2

ARIMA in R

The letter “i” in the R function arima means integration. After \(d\)th differencing, the time series becomes a stationary ARMA.
Simulation: arima.sim(list(order = c(p,d,q), ar = , ma = )
Estimation: arima( y, order = c(p,d,q) )

n = 100000
y <- arima.sim( n = n, list(order = c(2,1,2), ar = c(0.1, 0.1), ma = c(0.3, 0.1) ) )
plot(y)

arima(y, order = c(2,1,2))

## 
## Call:
## arima(x = y, order = c(2, 1, 2))
## 
## Coefficients:
##          ar1     ar2     ma1     ma2
##       0.1173  0.0793  0.2817  0.1159
## s.e.  0.0429  0.0246  0.0428  0.0099
## 
## sigma^2 estimated as 0.9966:  log likelihood = -141722.9,  aic = 283455.7

I(2) simulation and estimation

n = 100000
y <- arima.sim( n = n, list(order = c(2,2,2), ar = c(0.1, 0.1), ma = c(0.3, 0.1) ) )
plot(y)

arima(y, order = c(2,2,2))

## 
## Call:
## arima(x = y, order = c(2, 2, 2))
## 
## Coefficients:
##          ar1     ar2     ma1     ma2
##       0.1165  0.0913  0.2908  0.1029
## s.e.  0.0415  0.0235  0.0415  0.0093
## 
## sigma^2 estimated as 0.9988:  log likelihood = -141832.4,  aic = 283674.8

Real data example

S&P 500 Index: level data and logarithm data

SPX <- quantmod::getSymbols("^GSPC",auto.assign = FALSE, from = "2000-01-01")$GSPC.Close
plot(SPX)

lSPX <- log(SPX)
plot(lSPX)

save(SPX, lSPX, file = "lSPX.Rdata")

Differencing the logarithm data transforms into a stationary time series

dSPX <- diff( log(SPX) ) 
plot(dSPX)

Test unit root

The AR(1) regression with no drift is

\[ y_t = \beta y_{t-1} + \epsilon_t \] where \(\epsilon_t \sim \mathrm{iid} (0, \sigma^2)\).

We want to test the null hypothesis

\[ H_0: \beta =1, \] which means that the time series \(y_t\) is a unit root process.

The alternative hypothesis is

\[ H_1: |\beta| < 1, \] which means \(y_t\) is stationary. (Economists don’t really care about \(\beta < -1\).)

\(t\)-statistic

From OLS, we have the \(t\)-statistic

\[ t_{\beta} = (\hat{\beta} - 1) / \mathrm{se}(\hat{\beta}) \]

In regressions with cross sectional data, usually the \(t\)-statistic asymptotically converges to \(N(0,1)\), based on which we conduct hypotheses testing or construct confidence intervals. The same happens \(y_t\) is stationary time series.
The key difference here is that the \(t\)-statistic does not converge to a normal distribution when \(y_t\) is nonstationary.

Alternative representation

Mathematically equivalent formulation:

\[ \Delta y_t = \gamma y_{t-1} + \epsilon_t, \] where \(\gamma = \beta - 1\).

The null hypothesis now becomes

\[ H_0: \gamma =0, \] versus the alternative hypothesis

\[ H_1: \gamma < 0 \]

The \(t\)-statistic from OLS (with no intercept) is

\[ t_{\gamma} = \hat{\gamma} / \mathrm{se}(\hat{\gamma}) \]

Either using \(\gamma\) or \(\beta\), the values of the \(t\)-statistics are exactly the same.
As a historical convention, most statistical software, such as the urca package in R, adopt the \(\gamma\) representation

Dicky-Fuller test

Dicky and Fuller (1979, 1981) study the asymptotic distribution of the \(t\)-statistic.
They find that although the distribution is non-standard, it is a stable distribution and it can be easily simulated from computer.
Thanks to its popularity, in the literature the test is often referred to as the DF test, and the asymptotic distribution is called the DF distribution.

Distribution of DF test

set.seed(2021-9-22)
library(dynlm, quietly = TRUE)

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

DF.sim = function(ar){
  Rep = 2000
  n = 100
  
  t.stat = rep(0, Rep)
  
  for (r in 1:Rep){
    if (ar < 1) {
      y = arima.sim( model = list(ar = ar), n = n)
      reg.dyn = dynlm( y  ~  L(y,1)-1 )
      t.stat[r] = (summary(reg.dyn)[[4]][1] - ar) / summary(reg.dyn)[[4]][2]
    } else if (ar == 1){
      y = ts( cumsum( rnorm(n) ) )
      reg.dyn = dynlm( diff(y) ~ L(y,1)-1 )
      t.stat[r] = summary(reg.dyn)[[4]][3]      
    }
  }
  return(t.stat)
  print("simulation is done with ar = ", ar, "\n")
}


B = DF.sim(1)
plot(density(B), col = "black", xlim = c(-4, 4))

B = DF.sim(0.5)
lines(density(B), col = "blue")

B = DF.sim(0.9)
lines( density(B) , col = "purple" )

xgrid = seq(-4, 4, by = 0.01)
lines( x = xgrid, dnorm(xgrid), col = "black", lty = 2 )
abline( v=0, lty = 3)

Interpret the screen print

An example when the null is true

library(urca, quietly = TRUE)
n <- 100
y <- arima.sim( n = n, list(order = c(0,1,0) ) )
DFtest <- ur.df( y, type = "none", lags = 0 )
summary(DFtest)

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.22297 -0.62445  0.02881  0.81155  2.30497 
## 
## Coefficients:
##         Estimate Std. Error t value Pr(>|t|)
## z.lag.1  0.01430    0.01219   1.173    0.244
## 
## Residual standard error: 1.025 on 99 degrees of freedom
## Multiple R-squared:  0.01371,    Adjusted R-squared:  0.00375 
## F-statistic: 1.376 on 1 and 99 DF,  p-value: 0.2435
## 
## 
## Value of test-statistic is: 1.1732 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

One-sided test
The t-statistic is usually negative
Pay attention to the critical values
The more negative is the t-statistic, the stronger is the evidence of rejection

Interpret the screen print (continue)

An example when the null is false

library(urca, quietly = TRUE)
n <- 100
y <- arima.sim( n = n, list(ar = 0.5 ) )
DFtest <- ur.df( y, type = "none", lags = 0 )
summary(DFtest)

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.64746 -0.66698 -0.06205  0.41218  2.20997 
## 
## Coefficients:
##         Estimate Std. Error t value Pr(>|t|)    
## z.lag.1 -0.56473    0.09021   -6.26 1.02e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9927 on 98 degrees of freedom
## Multiple R-squared:  0.2856, Adjusted R-squared:  0.2784 
## F-statistic: 39.19 on 1 and 98 DF,  p-value: 1.02e-08
## 
## 
## Value of test-statistic is: -6.26 
## 
## Critical values for test statistics: 
##      1pct  5pct 10pct
## tau1 -2.6 -1.95 -1.61

Real data example

library(urca); 
DFtest <- ur.df( lSPX, type = "none", lags = 0 )
summary(DFtest)

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.127880 -0.004906  0.000415  0.005602  0.109376 
## 
## Coefficients:
##          Estimate Std. Error t value Pr(>|t|)
## z.lag.1 2.878e-05  2.248e-05    1.28    0.201
## 
## Residual standard error: 0.01238 on 5544 degrees of freedom
## Multiple R-squared:  0.0002955,  Adjusted R-squared:  0.0001152 
## F-statistic: 1.639 on 1 and 5544 DF,  p-value: 0.2005
## 
## 
## Value of test-statistic is: 1.2802 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

Estimation with differenced data

Judging stationary based on tests is subject to testing errors.
Many applied economists are inclined to transform a potentially nonstationary time series into a stationary time series, in order to circumvent the inconvenience brought by nonstationarity. Is it sound practice?
Suppose \(y_t\) is generated from \(y_t = \beta y_{t-1} + \epsilon_t\), where \(|\beta| \leq 1\).
What happens if we regress \(\Delta y_t\) on \(\Delta y_{t-1}\)?

\[ (y_t - y_{t-1}) = \beta (y_{t-1} - y_{t-2}) + (\epsilon_t - \epsilon_{t-1}). \] The error term and the regressor are correlated. OLS \(\hat{\beta}\) is inconsistent for the original equation.

Estimation with differenced data (continue)

The problem is easy to demonstrate in a special case: \(\beta = 1\).
In the differenced equation,

\[ \hat{\beta} = \frac{ \sum \Delta y_{t-1} \Delta y_t }{ \sum (\Delta y_{t-1})^2 } = \frac{ T^{-1} \sum \epsilon_{t-1} \epsilon_t }{ T^{-1} \sum \epsilon^2_{t-1}} \stackrel{p}{\to} 0, \]

instead of the true value \(1\).

The level data regression and the differenced data regression are about two different relationships. One does not imply the other.

Random walk with drift

AR(1) with AR coefficient \(\beta = 1\) and \(\mu \neq 0\). \[ y_t = \mu + y_{t-1} + \epsilon_t \]
A deterministic drift (mean shift) each period

Again, consider \((y_1,y_2,\ldots, y_T)\) with the initial value \(y_0 = 0\).

\(y_t = t \mu + \epsilon_1 + \epsilon_2 + \ldots + \epsilon_t\)
Linear deterministic trend plus a stochastic trend component
Mean: \(E[y_t] = t\mu\)
Variance: \(var[y_t] = t \sigma^2\)
Best mean prediction: \(E_t[y_{t+h}] = h \mu + y_t\) for \(h>0\)

Random walk with drift

The data generating process is

n <- 100
x <- 1 + rnorm(n) # mu = 1, sigma = 1
y <- cumsum(x)
plot(y, type = "l")

DF test with drift

The common representation

\[ \Delta y_t = \mu + \gamma y_{t-1} + \epsilon_t \]

The null hypothesis is still \(H_0: \gamma =0,\) versus the alternative hypothesis \(H_1: \gamma < 0\).
The \(t\)-statistic from OLS (with intercept) remains \(t_{\gamma} = \hat{\gamma} / \mathrm{se}(\hat{\gamma})\)

print( summary(ur.df(y, type = "drift", lags = 0) ) )

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression drift 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.37681 -0.81493  0.00268  0.79760  2.80549 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.9748559  0.2104664   4.632 1.13e-05 ***
## z.lag.1     0.0006186  0.0037239   0.166    0.868    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.067 on 97 degrees of freedom
## Multiple R-squared:  0.0002844,  Adjusted R-squared:  -0.01002 
## F-statistic: 0.0276 on 1 and 97 DF,  p-value: 0.8684
## 
## 
## Value of test-statistic is: 0.1661 43.9129 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau2 -3.51 -2.89 -2.58
## phi1  6.70  4.71  3.86

Phi1 refers to the joint null hypothesis \(\mu = \gamma = 0\). This statistic is non-negative. The bigger is the value, the stronger is the evidence of rejection.

Random walk with drift and trend

The data generating process is

\[ y = \mu + \delta t + \beta y_{t-1} + \epsilon_t \]

Assume initial value \(y_0 = 0\):

\[ \begin{align} y_t & = \mu t + \delta (1+2+\cdots+t) + \epsilon_1 + \cdots + \epsilon_t \\ & = \mu t + \frac{\delta }{2} t(t+1) + \epsilon_1 + \cdots + \epsilon_t \end{align} \]

When \(\beta = 1\):
- \(E[y_t] = \mu t + \frac{\delta }{2} t(t+1)\). Quadratic trend
- Deterministic component \(\mu t + \frac{\delta }{2} t(t+1)\)
- \(var[y_t] = t\sigma^2\)
- Stochastic component \(\epsilon_1 + \epsilon_2 + \cdots + \epsilon_t\)

Numerical example

When \(\mu \neq 0\) or \(\delta \neq 0\), the stochastic trend is dominated by the deterministic trend

n <- 100
x <- 0.2 + 0.05*(1:n) + rnorm(n) # mu = 1, sigma = 1
y <- cumsum(x)
plot(y, type = "l")

DF test with drift and trend

The common alternative representation \[ \Delta y_t = \mu + \delta t + \gamma y_{t-1} + \epsilon_t \]
The null hypothesis is \(H_0: \gamma =0,\) versus the alternative hypothesis \(H_1: \gamma < 0\).
The \(t\)-statistic from OLS (with intercept and linear trend) is still \(t_{\gamma} = \hat{\gamma} / \mathrm{se}(\hat{\gamma})\)
The critical values are different from the previous two cases

print( summary(ur.df(y, type = "trend", lags = 0) ) )

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.43668 -0.78537  0.02051  0.69073  2.61615 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  0.061664   0.340928   0.181  0.85685   
## z.lag.1     -0.000671   0.005963  -0.113  0.91064   
## tt           0.054053   0.016364   3.303  0.00134 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.073 on 96 degrees of freedom
## Multiple R-squared:  0.6663, Adjusted R-squared:  0.6593 
## F-statistic: 95.83 on 2 and 96 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -0.1125 273.3224 95.8253 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -4.04 -3.45 -3.15
## phi2  6.50  4.88  4.16
## phi3  8.73  6.49  5.47

Phi2 again refers to the joint null hypothesis \(\mu = \gamma = 0\).
Phi3 refers to the joint null hypothesis \(\mu = \delta = \gamma = 0\).
These joint tests are two sided.

Specifications of DF tests

Three regressions
- none: \(y_t = \beta y_{t-1} + \epsilon_t\)
- drift: \(y_t = \mu + \beta y_{t-1} + \epsilon_t\)
- trend: \(y_t = \mu + \delta t + \beta y_{t-1} + \epsilon_t\)
Each specification leads to a different asymptotic distribution, and thus provides different critical values.
The asymptotic distribution of the DF test is based on the assumption that the error term has no serial correlation.

Augmented DF test

To cope with the violation of the assumption of no serial correlation, the augmented Dicky-Fuller (ADF) test adds more differenced lag terms \(\Delta y_{t-j}\), \(j=1,\ldots,p\).
Three regressions
- none: \(\Delta y_t = \gamma y_{t-1} + \sum_{j=1}^p \phi_j \Delta y_{t-j} + \epsilon_t\)
- drift: \(\Delta y_t = \mu + \gamma y_{t-1} + \sum_{j=1}^p \phi_j \Delta y_{t-j} + \epsilon_t\)
- trend: \(\Delta y_t = \mu + \delta t + \gamma y_{t-1} + \sum_{j=1}^p \phi_j \Delta y_{t-j} + \epsilon_t\)
The lag terms are supposed to absorb serial correlation
Consider the model under the null \(\gamma = 0\). It is an AR(p) for \(\Delta y_t\)
The number of lags can be decided by AIC or BIC.

Example

Generate data with ARIMA

y <- arima.sim( model = list(order = c(3,1,1), ar = c(0.4, 0.2, 0.2), ma = 0.5), n = 1000 )
df <- ur.df(y, type = "trend", lags = 10, selectlags="AIC" )
print(summary( df ) )

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression trend 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.94030 -0.65582  0.00723  0.66567  2.87812 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.0313938  0.0661852  -0.474   0.6354    
## z.lag.1     -0.0019963  0.0007750  -2.576   0.0101 *  
## tt          -0.0006811  0.0003068  -2.220   0.0266 *  
## z.diff.lag1  0.9134427  0.0315704  28.933  < 2e-16 ***
## z.diff.lag2 -0.2603535  0.0415695  -6.263 5.63e-10 ***
## z.diff.lag3  0.3436490  0.0414980   8.281 3.96e-16 ***
## z.diff.lag4 -0.1292930  0.0316362  -4.087 4.73e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9887 on 983 degrees of freedom
## Multiple R-squared:  0.7445, Adjusted R-squared:  0.7429 
## F-statistic: 477.3 on 6 and 983 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -2.5758 2.894 3.4493 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau3 -3.96 -3.41 -3.12
## phi2  6.09  4.68  4.03
## phi3  8.27  6.25  5.34

Phillips-Perron test

Phillips and Perron (1988) handle serial correlation and heteroskedasticity in the error term nonparametrically
No lags of \(\Delta y_t\) are included in the regression
This nonparametric approach induces the long-run variance

Long-run variance

Consider 3 observations \((X_1, X_2, X_3)\) from a weakly stationary time series, with \(E[X_i] = 0\) and \(var[X_i] = \gamma_0\). The variance of the scaled average is:

\[ \begin{align} var[\frac{1}{\sqrt{3}} (X_1 + X_2 + X_3)] & = \frac{1}{3} E[ (X_1 + X_2 + X_3)^2] \\ & = \frac{1}{3} E[ X_1^2 + X_2^2 + X_3^2 + 2 X_1 X_2 + 2 X_2 X_3 + 2 X_1 X_3] \\ & = \gamma_0 + 2( \frac{2}{3} \gamma_1 + \frac{1}{3} \gamma_2) \end{align} \]

Extend it to \((X_1, \ldots, X_T)\). The variance of the scaled (by \(\sqrt{T}\)) average is:

\[ \begin{align} var[\frac{1}{\sqrt{T}} \sum_{t=1}^T X_t ] & = \frac{1}{T} E[ (\sum_{t=1}^T X_t) ^2] \\ & = \frac{1}{T} E[ \sum_{t=1}^T X_t^2 + 2 \sum_{t=1}^T \sum_{j > 1}^{T-j} X_t X_{t+j} ] \\ & = \gamma_0 + 2 \sum_{j=1}^{T-1} \left(1 - \frac{j}{T} \right) \gamma_j \end{align} \]

When \(\sum_{t=1}^{\infty} |\gamma_j| < \infty\), the following variance of the scaled series is convergent:

\[ \begin{align} var\left[ \lim_{T\to \infty }\frac{1}{\sqrt{T}} \sum_{t=1}^T X_t \right] & = \gamma_0 + 2 \sum_{j=1}^{\infty} \gamma_j \end{align} \]

Exercise: Consider the MA(2) model \(y_t = \epsilon_{t}+ 0.4 \epsilon_{t-1}+ 0.2 \epsilon_{t-2}\), where \(\epsilon_t \sim \mathrm{iid}(0,\sigma^2)\).
- What is \(var[y_t]\)?
- What is \(lrvar[y_t]\)?

Long-run variance (continue)

\(var\left[ \lim_{T\to \infty }\frac{1}{\sqrt{T}} \sum_{t=1}^T X_t \right]\) is called the long-run variance
Compared to the plain variance \(\gamma_0\), it takes the serial correlation into consideration
Long-run variance can be defined for any time series, not necessarily strongly or weakly stationary, as long as \(var\left[ \lim_{T\to \infty }\frac{1}{\sqrt{T}} \sum_{t=1}^T X_t \right] <\infty\) exists.

What happens to OLS

When the error term \(\epsilon_t\) is serially correlated
Gauss-Markov theorem is gone
Asymptotic variance must be modified
Consider, for simplicity, the simple regression

\[ \sqrt{T}(\hat{\beta} - \beta_0) = \sqrt{T} \times \frac{ \sum (x_t - \bar{x}) \epsilon_t}{ \sum (x_t - \bar{x})^2} = \frac{ T^{-1/2} \sum (x_t - \bar{x}) \epsilon_t}{ T^{-1} \sum (x_t - \bar{x})^2} \]

When \(\epsilon_t\) is serially correlated, in the numerator \(x_t \epsilon_t\) is serially correlated in general
The asymptotic variance of OLS becomes

\[ \frac{lrvar[x_t \epsilon_t]} {(var[x_t])^2}, \]

instead of the familiar form under the Gauss-Markov theorem:

\[ var[x_t \epsilon_t]/ (var[x_t])^2 = var[x_t ] var[\epsilon_t]/ (var[x_t])^2=var[\epsilon_t]/var[x_t ] \]

Estimation of lrvar

The expression \(\gamma_0 + 2 \sum_{j=1}^{\infty} \gamma_j\) is valid for weakly stationary time series only. Let us focus on this case in the estimation.
The sample has \(T\) observations. Impossible to accurately estimate \(\hat{\gamma}_j\) when \(j\) is close to \(T\) or larger than \(T\).
Given the convergence property \(\sum_{j=0}^{\infty} |\gamma_j| < \infty\), we approximate the infinite sum by a truncated version

\[ \hat{f}_0 = \hat{\gamma}_0 + 2 \sum_{j=1}^p w_{pj} \hat{\gamma}_j \]

where \(p\) is the number of lags. Asymptotically, \(p/T \to 0\) for consistency.

\(p\) is a tuning parameter to be chosen. In practice \(p \ll T\).
\(w_{pj}\) is called the kernel weight. Popular choices:
- Nadaraya–Watson kernel: \(w_{jp} = 1\)
- Bartlett kernel: \(w_{jp} = 1-\frac{j}{p}\) (Newey and West, 1987)

Phillips-Perron test (continue)

Phillips-Perron test statistic is a modified version of the \(t\)-statistic (The formula on the text book Eq.(5.26) has typos)

\[ t_{pp} = t_{\gamma} \sqrt{ \frac{\hat{\gamma}_0}{\hat{f}_0} } - \frac{ T\cdot (\hat{f}_0 - \hat{\gamma}_0) \cdot SE(\hat{\gamma}) }{2 \hat{f}_0 \hat{\gamma}_0 } \]

where \(t_{\gamma}\) is the OLS estimator of the \(\gamma\) coefficient, \(SE(\hat{\gamma})\) is the standard deviation from OLS, \(\hat{\gamma}_0\) is a consistent estimator of the OLS regression residual and \[ \hat{f}_0 = \hat{\gamma}_0 + 2 \sum_{j=1}^p (1-j/p) \hat{\gamma}_j \] is an consistent estimator of the lrvar of the OLS regression residual.

The PP test is based on the DF test. The deterministic trend is modeled in a similar manner.
The difference lies in the way to cope with more general forms of the error term. The PP test statistics follow their associated asymptotic distributions, which have been simulated and tabulated.

Example of PP-test

pp <- ur.pp(y, type="Z-tau", model="trend", lags="short")
summary(pp)

## 
## ################################## 
## # Phillips-Perron Unit Root Test # 
## ################################## 
## 
## Test regression with intercept and trend 
## 
## 
## Call:
## lm(formula = y ~ y.l1 + trend)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.6491 -1.0720  0.1126  1.2590  5.4567 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.2882585  0.2563321  -1.125    0.261    
## y.l1         0.9999885  0.0015016 665.935   <2e-16 ***
## trend        0.0002709  0.0005921   0.458    0.647    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.946 on 997 degrees of freedom
## Multiple R-squared:  0.9997, Adjusted R-squared:  0.9997 
## F-statistic: 1.708e+06 on 2 and 997 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic, type: Z-tau  is: -1.4629 
## 
##            aux. Z statistics
## Z-tau-mu              0.9429
## Z-tau-beta           -1.1722
## 
## Critical values for Z statistics: 
##                    1pct      5pct     10pct
## critical values -3.9722 -3.416657 -3.130326

Notice: the options lags or use.lag in ur.pp is about the lags in the long-run variance, not the lag of the \(\Delta y_t\) in the regression

Stationarity as the null hypothesis

The null of the ADF and PP tests is unit root
Kwiatkowski, Phillips, Schmidt and Shin (1992) device a test under the null of stationarity
Regression model

\[ y_t = \mu + \delta t + w_t + \epsilon_t, \]

in which \(w_t = w_{t-1} + v_t\) for some \(v_t \sim \mathrm{iid} (0, \sigma^2_v)\).

Null hypothesis \(H_0: \sigma_v^2 = 0\)
- under which \(w = w_t\) for all \(t\) becomes a constant
- the regression thus is \(y_t = \mu_w + \delta t + \epsilon_t\), where \(\mu_w = \mu + w\).
Alternative hypothesis: \(H_1: \sigma_v^2 >0\), under which the regression is a linear deterministic trend plus a random walk.

KPSS

KPSS test statistic:

\[ KPSS = \frac{1}{T^2 \hat{f}_0} \sum_{t=1}^T ( \sum_{j=1}^t \hat{\epsilon}_j)^2 \]

ur.kpss(y, type = "tau",  lags = "short") %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: tau with 7 lags. 
## 
## Value of test-statistic is: 0.4653 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.119 0.146  0.176 0.216

KPSS test statistic is a quadratic form
The bigger is the test statistic, the strong evidence of rejection
lags option in ur.kpass is again for the long-run variance estimation
type = mu means only intercept in the regression
type = tau means intercept and linear trend in the regression

Asset price bubbles (promotional)

The alternative hypothesis of conventional unit root tests is the stationary regime
Financial bubbles and crisese have been witnessed in history. How to detect bubbles?
Phillips, Wu and Yu (2011), Phillips, Shi and Yu (2015a, b)
Null hypothesis: \(\beta = 1\) (unit root) versus alternative hypothesis: \(\beta > 1\) (explosive)
Bubble is a transient phenomenon. Use rolling windows to improve power
The test statistic is based on ADF test. Take into consideration of the multiple testing issue
Reduced form by nature
Using long historical monthly data, Phillips, Shi and Yu (2015 b) identify three big historical bubbles: 1890’s, 1929, and 2001
In use in central banks

Implementation

Need to specify the beginning and the end of the time series, and the size of the smallest time window
R package psymonitor

library(psymonitor)

SPX.2020 <- quantmod::getSymbols("^GSPC",auto.assign = FALSE, from = "2020-01-01")$GSPC.Close
lSPX.2020 <- log(SPX.2020)

psy.stat <- PSY(lSPX.2020)
plot(psy.stat, type = "l")

The above block of code is slow (about 3 minutes). Run it if you want.
Vignette

Highly Persistent Time Series

Efficient market hypothesis

Unit root AR(1)

Simulated Example

Properties

Compare with stationary AR(1)

Consequences of unit root

Integrated time series

ARIMA in R

I(2) simulation and estimation

Real data example

Test unit root

\(t\)-statistic

Alternative representation

Dicky-Fuller test

Dicky-Fuller test

Distribution of DF test

Interpret the screen print

Interpret the screen print (continue)

Real data example

Estimation with differenced data

Estimation with differenced data (continue)

Random walk with drift

Random walk with drift

DF test with drift

Random walk with drift and trend

Numerical example

DF test with drift and trend

Specifications of DF tests

Augmented DF test

Example

Phillips-Perron test

Long-run variance

Long-run variance (continue)

What happens to OLS

Estimation of lrvar

Phillips-Perron test (continue)

Example of PP-test

Stationarity as the null hypothesis

KPSS

Asset price bubbles (promotional)

Implementation

Summary