ARMA

class: center, middle, inverse, title-slide

.title[
# ARMA
]
.author[
### Zhentao Shi
]
.date[
### Sep 13, 2021
]

---

## Replicable research

(in response to the 1st homework)

* Script based. Menu-clicking is not recommended.
  * Excel 1, Stata 1, Python 1, R 9
* Don't use absolute path for the data file
* Literate programming is welcome 
  * Jupyter notebook
  * R markdown
  * Alternative: text comments

* Why do I use R?
  * Coincidence
  * A big community with vast add-on choices
  * Combination of advantages from Stata and Python
  * I am also familiar with MATLAB

## ARMA

* Classical modeling approach
* Box and Jenkins (1970): *Time series analysis: Forecasting and control*
* George Box: "All models are wrong, but some are useful."

## Autocovariance

Remind: for a general weakly stationary time series

* Autocovariance: `$\gamma(k) = cov(y_t, y_{t+k})$` for `$k\in \mathbb{Z}$`

* Autocorrelation: `$\rho(k) = \gamma(k) / \gamma(0)$`

Use sample analogues to estimate

* Sample autocovariance:

`$$\hat{\gamma}(k) = \frac{1}{T-k}  \sum_{1\leq t,t+k \leq T} (y_t - \bar{y}) (y_{t+k} - \bar{y} )$$`

* Sample autocorrelation: `$\hat{\rho}(k) = \hat{\gamma}(k) / \hat{\gamma}(0)$`

## Simulated Example

```r
set.seed(2021-9-13)
y <- rnorm(100) # a white noise
plot(y, type = "l")
```

![](ts_slides4_files/figure-html/unnamed-chunk-1-1.png)

```r
acf(y)
```

![](ts_slides4_files/figure-html/unnamed-chunk-1-2.png)

## Autoregression

* AR(1)
* Generative model (DGP): `$$y_t = \mu + \beta y_{t-1} + \epsilon_t,$$` where `$\epsilon_t$` is iid with mean 0 and variance `$\sigma^2$` ( `$\epsilon \sim \mathrm{iid} (0,\sigma^2)$` ).

* Repeated substitution
$$
`\begin{align}
y_t & = \mu + \beta y_{t-1} + \epsilon_t \\
    & = \mu + \beta (\mu + \beta y_{t-2} +  \epsilon_{t-1} )  + \epsilon_t \\
    & = (1 + \beta)\mu +  \beta^2  y_{t-2} + (\epsilon_t + \beta \epsilon_{t-1} ) \\
    & = \cdots \\
    & = \mu \sum_{m=0}^{\infty} \beta^m  +  \beta^{\infty}  y_{t-\infty} + \sum_{t=0}^{\infty}   \beta^m \epsilon_{t-m} 
\end{align}`
$$

## Condition for stationarity

* Let `$L$` be a lag operator, meaning `$L y_t = y_{t-1}$`
* AR(1): `$(1-\beta L ) y_t = \mu +  \epsilon_t$`. 
* Replace `$L$` by z, and solve the polynomial
$$
1 - \beta z = 0
$$
  * If `$|\beta | < 1$`, then `$|z| > 1$` and `$y_t$` is stationary
  * If `$|\beta | = 1$`, then `$|z| = 1$` and `$y_t$` is a unit root process
  * If `$|\beta | > 1$`, then `$|z| < 1$` and `$y_t$` is a explosive process
  
## Properties of stationary AR(1)

* Stationary when `$|\beta| < 1$`.

* Mean `$E[y_t] =  \mu / (1 - \beta)$`.
* Variance `$\sigma_y^2  = var[y_t] = \sigma^2 / ( 1 - \beta^2)$`.
* Autocorrelation: `$\rho(k) = \beta^k$`
* Autocovariance: `$cov(y_t, y_{t+k}) = \beta^k \sigma_y^2$`
* Sample correlation can be estimated by direct computing or by regresion `$y_t$` on `$y_{t-k}$`
  * The latter is correct in view of the formula of `$\hat{\beta}$` in a simple regression

## AR(p)

* Generative model: `$$y_t = \mu + \beta_1 y_{t-1} + \cdots + \beta_p y_{t-p} + \epsilon_t.$$`
* In the lag operator form: `$$(1 - \beta_1 L - \cdots - \beta_p L^p ) y_t = \mu  + \epsilon_t.$$`
* Condition for stationarity:
  * The characteristic equation of the dynamic system is the following `$p$`th order polynomial
  `$$1 - \beta_1 z - \ldots - \beta_p z^p = 0.$$`
  * The equation has `$p$` roots (solutions) on the complex plane.
  * The AR(p) model is stationary if the modulus of all roots are outside of the unit circle, i.e. all roots have `$|z| > 1$`.

## Simulated examples

```r
ar.plot <- function(R, n, arcoef){
  d <- matrix(NA, n, R)
  for (r in 1:R){
    d[, r] <- arima.sim(model = list(ar = c(arcoef)), 
               n = n, start.innov = rep(0, 10000))  # "arima.sim" is a built-in function
  }
  return(  matplot(d, type = "l", ylab = "") )
}

R <- 5 # number of replications
n <- 100 # length of time series

ar.plot(R, n, 0.5)
```

![](ts_slides4_files/figure-html/unnamed-chunk-2-1.png)

```r
y <- arima.sim(model = list(ar = 0.5), n = n )
acf( y  )
```

![](ts_slides4_files/figure-html/unnamed-chunk-3-1.png)

```r
acf( y, type = "partial")
```

![](ts_slides4_files/figure-html/unnamed-chunk-3-2.png)

## Higher persistence

```r
ar.plot(R, n, 0.9) # higher AR coefficient
```

![](ts_slides4_files/figure-html/unnamed-chunk-4-1.png)

```r
acf( arima.sim(model = list(ar = 0.9), n = n, start.innov = rep(0, 10000) ) )
```

![](ts_slides4_files/figure-html/unnamed-chunk-4-2.png)

```r
ar.plot(R, n, 0.999)
```

![](ts_slides4_files/figure-html/unnamed-chunk-4-3.png)

```r
acf( arima.sim(model = list(ar = 0.999), n = n, start.innov = rep(0, 10000) ) )
```

![](ts_slides4_files/figure-html/unnamed-chunk-4-4.png)

## Estimation Method

OLS

In R, use the function `arima`.

* Syntax: `arima(y, order = c(ar_order, integrated_order, ma_order))`

## Real data example

```r
SPX <- quantmod::getSymbols("^GSPC",auto.assign = FALSE, 
           from = "2012-01-01", to = "2020-01-01")$GSPC.Close
ret = diff( log(SPX) ) 
ret <- ret[-1]
 
reg <- arima( ret, order = c(1,0,0) )
print(reg)
```

```
## 
## Call:
## arima(x = ret, order = c(1, 0, 0))
## 
## Coefficients:
##           ar1  intercept
##       -0.0172      5e-04
## s.e.   0.0223      2e-04
## 
## sigma^2 estimated as 6.532e-05:  log likelihood = 6835.71,  aic = -13665.43
```

## Partial Autocorrelation Function (PACF)

AR(1), ..., AR(p) regressions one by one

PACF is the value of the estimated coefficient of the last variable

* OLS `$y_t = \mu + \beta_1 y_{t-1} + \epsilon_t$`
* OLS `$y_t = \mu + \beta_1 y_{t-1} + \beta_2 y_{t-2} + \epsilon_t$`
* OLS `$y_t = \mu + \beta_1 y_{t-1} + \cdots + \beta_p y_{t-p} + \epsilon_t$`

PACF can help determine the order of AR specifications.

In R, either `pacf` or `acf(..., type = "partial")`.

```r
y <- rnorm(100)
pacf(y)
```

![](ts_slides4_files/figure-html/unnamed-chunk-6-1.png)

```r
pacf( arima.sim(model = list(ar = c(0.4, 0.4) ), n = 100, start.innov = rep(0, 10000) ) )
```

![](ts_slides4_files/figure-html/unnamed-chunk-7-1.png)

## Example

Nonstationary AR

```r
arima.sim(model = list(ar = 1 ), n = 100 )
```

```
## Error in arima.sim(model = list(ar = 1), n = 100): 'ar' part of model is not stationary
```

```r
arima.sim(model = list(ar = c(0.6, 0.6) ), n = 100 )
```

```
## Error in arima.sim(model = list(ar = c(0.6, 0.6)), n = 100): 'ar' part of model is not stationary
```

(The above error messages are placed there intentionally)

Solve the characteristic equation

```r
chara_eqn <- polynom::polynomial(c(1, -0.6, -.6))
z_hat <- solve(chara_eqn)
print(z_hat); print( abs( z_hat ) )
```

```
## [1] -1.8844373  0.8844373
```

```
## [1] 1.8844373 0.8844373
```

```r
chara_eqn <- polynom::polynomial(c(1, -0.6, 0.6))
z_hat <- solve(chara_eqn)
print(z_hat); print( abs( z_hat ) )
```

```
## [1] 0.5-1.190238i 0.5+1.190238i
```

```
## [1] 1.290994 1.290994
```

## Moving average

* MA(1)
* Generative model: `$$y_t = \mu + \epsilon_t + \theta \epsilon_{t-1},$$` where `$\epsilon_t \sim \mathrm{iid} (0, \sigma^2)$`.
* Mean `$E[y_t] = \mu$`
* Variance `$var[y_t ] = (1+ \theta^2) \sigma^2$`
* Autocovariance: `$cov[y_t, y_{t-1}] = \theta \sigma_y^2$`, and  `$cov[y_t, y_{t-h}] = 0$` for `$h\geq2$`.
* MA(1) must be stationary regardless the value of `$\theta$`

```r
y <- arima.sim(model = list(ma = 0.9), n = n)
plot(y)
```

![](ts_slides4_files/figure-html/unnamed-chunk-10-1.png)

```r
acf(y)
```

![](ts_slides4_files/figure-html/unnamed-chunk-10-2.png)

## MA(q)

* Generative model: `$$y_t = \mu + \epsilon_t + \theta_1 \epsilon_{t-1} + \cdots + \theta_q \epsilon_{t-q}$$`
* Mean `$E[y_t] = \mu$`
* Variance `$var[y_t ] = (1+ \theta_1^2 + \cdots + \theta^2_q) \sigma^2$`
* Limited memory: `$cov[y_t, y_{t-h}] = 0$` for all `$h > q$`.
* MA(q) must be stationary
* Estimation is based iteration. No closed form solution available in general.

## ARMA(p, q)

* General model
$$
y_t - \beta_1 y_{t-1} - \cdots - \beta_p y_{t-p}  =   \mu + \theta_1 \epsilon_{t-1} + \cdots + \theta_q \epsilon_{t-q} + \epsilon_t
$$

* Stationarity condition is the same as AR(p), as the MA(q) part is always stationary.
* Estimation is based iteration as well, due to the MA component.

## Information criteria

* Determine `$p$` and `$q$`
* Akaika: `$AIC = \log( \hat {\sigma}^2) + \frac{2}{T}(p+q)$`
* Bayesian (Schwarz) `$BIC = \log( \hat {\sigma}^2) + \frac{\log T}{T}(p+q)$`
* Hannan: `$HIC= \log( \hat {\sigma}^2) + \frac{2\log \log T}{T}(p+q)$`

```r
y <- arima.sim(model = list(ar = 0.5, ma = 0.5), n = 100)
# AIC
arima(y, order = c(1, 0, 0)) %>% AIC() %>% print()
```

```
## [1] 283.1934
```

```r
arima(y, order = c(1, 0, 1)) %>% AIC() %>% print()
```

```
## [1] 273.5499
```

```r
# BIC
arima(y, order = c(1, 0, 0)) %>% BIC() %>% print()
```

```
## [1] 291.0089
```

```r
arima(y, order = c(1, 0, 1)) %>% BIC() %>% print()
```

```
## [1] 283.9706
```