class: center, middle, inverse, title-slide .title[ # ARMA ] .author[ ### Zhentao Shi ] .date[ ### Sep 13, 2021 ] --- ## Replicable research (in response to the 1st homework) * Script based. Menu-clicking is not recommended. * Excel 1, Stata 1, Python 1, R 9 * Don't use absolute path for the data file * Literate programming is welcome * Jupyter notebook * R markdown * Alternative: text comments * Why do I use R? * Coincidence * A big community with vast add-on choices * Combination of advantages from Stata and Python * I am also familiar with MATLAB ## ARMA * Classical modeling approach * Box and Jenkins (1970): *Time series analysis: Forecasting and control* * George Box: "All models are wrong, but some are useful." ## Autocovariance Remind: for a general weakly stationary time series * Autocovariance: `\(\gamma(k) = cov(y_t, y_{t+k})\)` for `\(k\in \mathbb{Z}\)` * Autocorrelation: `\(\rho(k) = \gamma(k) / \gamma(0)\)` Use sample analogues to estimate * Sample autocovariance: `$$\hat{\gamma}(k) = \frac{1}{T-k} \sum_{1\leq t,t+k \leq T} (y_t - \bar{y}) (y_{t+k} - \bar{y} )$$` * Sample autocorrelation: `\(\hat{\rho}(k) = \hat{\gamma}(k) / \hat{\gamma}(0)\)` ## Simulated Example ```r set.seed(2021-9-13) y <- rnorm(100) # a white noise plot(y, type = "l") ``` ![](ts_slides4_files/figure-html/unnamed-chunk-1-1.png)<!-- --> ```r acf(y) ``` ![](ts_slides4_files/figure-html/unnamed-chunk-1-2.png)<!-- --> ## Autoregression * AR(1) * Generative model (DGP): `$$y_t = \mu + \beta y_{t-1} + \epsilon_t,$$` where `\(\epsilon_t\)` is iid with mean 0 and variance `\(\sigma^2\)` ( `\(\epsilon \sim \mathrm{iid} (0,\sigma^2)\)` ). * Repeated substitution $$ `\begin{align} y_t & = \mu + \beta y_{t-1} + \epsilon_t \\ & = \mu + \beta (\mu + \beta y_{t-2} + \epsilon_{t-1} ) + \epsilon_t \\ & = (1 + \beta)\mu + \beta^2 y_{t-2} + (\epsilon_t + \beta \epsilon_{t-1} ) \\ & = \cdots \\ & = \mu \sum_{m=0}^{\infty} \beta^m + \beta^{\infty} y_{t-\infty} + \sum_{t=0}^{\infty} \beta^m \epsilon_{t-m} \end{align}` $$ ## Condition for stationarity * Let `\(L\)` be a lag operator, meaning `\(L y_t = y_{t-1}\)` * AR(1): `\((1-\beta L ) y_t = \mu + \epsilon_t\)`. * Replace `\(L\)` by z, and solve the polynomial $$ 1 - \beta z = 0 $$ * If `\(|\beta | < 1\)`, then `\(|z| > 1\)` and `\(y_t\)` is stationary * If `\(|\beta | = 1\)`, then `\(|z| = 1\)` and `\(y_t\)` is a unit root process * If `\(|\beta | > 1\)`, then `\(|z| < 1\)` and `\(y_t\)` is a explosive process ## Properties of stationary AR(1) * Stationary when `\(|\beta| < 1\)`. * Mean `\(E[y_t] = \mu / (1 - \beta)\)`. * Variance `\(\sigma_y^2 = var[y_t] = \sigma^2 / ( 1 - \beta^2)\)`. * Autocorrelation: `\(\rho(k) = \beta^k\)` * Autocovariance: `\(cov(y_t, y_{t+k}) = \beta^k \sigma_y^2\)` * Sample correlation can be estimated by direct computing or by regresion `\(y_t\)` on `\(y_{t-k}\)` * The latter is correct in view of the formula of `\(\hat{\beta}\)` in a simple regression ## AR(p) * Generative model: `$$y_t = \mu + \beta_1 y_{t-1} + \cdots + \beta_p y_{t-p} + \epsilon_t.$$` * In the lag operator form: `$$(1 - \beta_1 L - \cdots - \beta_p L^p ) y_t = \mu + \epsilon_t.$$` * Condition for stationarity: * The characteristic equation of the dynamic system is the following `\(p\)`th order polynomial `$$1 - \beta_1 z - \ldots - \beta_p z^p = 0.$$` * The equation has `\(p\)` roots (solutions) on the complex plane. * The AR(p) model is stationary if the modulus of all roots are outside of the unit circle, i.e. all roots have `\(|z| > 1\)`. ## Simulated examples ```r ar.plot <- function(R, n, arcoef){ d <- matrix(NA, n, R) for (r in 1:R){ d[, r] <- arima.sim(model = list(ar = c(arcoef)), n = n, start.innov = rep(0, 10000)) # "arima.sim" is a built-in function } return( matplot(d, type = "l", ylab = "") ) } R <- 5 # number of replications n <- 100 # length of time series ar.plot(R, n, 0.5) ``` ![](ts_slides4_files/figure-html/unnamed-chunk-2-1.png)<!-- --> ```r y <- arima.sim(model = list(ar = 0.5), n = n ) acf( y ) ``` ![](ts_slides4_files/figure-html/unnamed-chunk-3-1.png)<!-- --> ```r acf( y, type = "partial") ``` ![](ts_slides4_files/figure-html/unnamed-chunk-3-2.png)<!-- --> ## Higher persistence ```r ar.plot(R, n, 0.9) # higher AR coefficient ``` ![](ts_slides4_files/figure-html/unnamed-chunk-4-1.png)<!-- --> ```r acf( arima.sim(model = list(ar = 0.9), n = n, start.innov = rep(0, 10000) ) ) ``` ![](ts_slides4_files/figure-html/unnamed-chunk-4-2.png)<!-- --> ```r ar.plot(R, n, 0.999) ``` ![](ts_slides4_files/figure-html/unnamed-chunk-4-3.png)<!-- --> ```r acf( arima.sim(model = list(ar = 0.999), n = n, start.innov = rep(0, 10000) ) ) ``` ![](ts_slides4_files/figure-html/unnamed-chunk-4-4.png)<!-- --> ## Estimation Method OLS In R, use the function `arima`. * Syntax: `arima(y, order = c(ar_order, integrated_order, ma_order))` ## Real data example ```r SPX <- quantmod::getSymbols("^GSPC",auto.assign = FALSE, from = "2012-01-01", to = "2020-01-01")$GSPC.Close ret = diff( log(SPX) ) ret <- ret[-1] reg <- arima( ret, order = c(1,0,0) ) print(reg) ``` ``` ## ## Call: ## arima(x = ret, order = c(1, 0, 0)) ## ## Coefficients: ## ar1 intercept ## -0.0172 5e-04 ## s.e. 0.0223 2e-04 ## ## sigma^2 estimated as 6.532e-05: log likelihood = 6835.71, aic = -13665.43 ``` ## Partial Autocorrelation Function (PACF) AR(1), ..., AR(p) regressions one by one PACF is the value of the estimated coefficient of the last variable * OLS `\(y_t = \mu + \beta_1 y_{t-1} + \epsilon_t\)` * OLS `\(y_t = \mu + \beta_1 y_{t-1} + \beta_2 y_{t-2} + \epsilon_t\)` * OLS `\(y_t = \mu + \beta_1 y_{t-1} + \cdots + \beta_p y_{t-p} + \epsilon_t\)` PACF can help determine the order of AR specifications. In R, either `pacf` or `acf(..., type = "partial")`. ```r y <- rnorm(100) pacf(y) ``` ![](ts_slides4_files/figure-html/unnamed-chunk-6-1.png)<!-- --> ```r pacf( arima.sim(model = list(ar = c(0.4, 0.4) ), n = 100, start.innov = rep(0, 10000) ) ) ``` ![](ts_slides4_files/figure-html/unnamed-chunk-7-1.png)<!-- --> ## Example Nonstationary AR ```r arima.sim(model = list(ar = 1 ), n = 100 ) ``` ``` ## Error in arima.sim(model = list(ar = 1), n = 100): 'ar' part of model is not stationary ``` ```r arima.sim(model = list(ar = c(0.6, 0.6) ), n = 100 ) ``` ``` ## Error in arima.sim(model = list(ar = c(0.6, 0.6)), n = 100): 'ar' part of model is not stationary ``` (The above error messages are placed there intentionally) Solve the characteristic equation ```r chara_eqn <- polynom::polynomial(c(1, -0.6, -.6)) z_hat <- solve(chara_eqn) print(z_hat); print( abs( z_hat ) ) ``` ``` ## [1] -1.8844373 0.8844373 ``` ``` ## [1] 1.8844373 0.8844373 ``` ```r chara_eqn <- polynom::polynomial(c(1, -0.6, 0.6)) z_hat <- solve(chara_eqn) print(z_hat); print( abs( z_hat ) ) ``` ``` ## [1] 0.5-1.190238i 0.5+1.190238i ``` ``` ## [1] 1.290994 1.290994 ``` ## Moving average * MA(1) * Generative model: `$$y_t = \mu + \epsilon_t + \theta \epsilon_{t-1},$$` where `\(\epsilon_t \sim \mathrm{iid} (0, \sigma^2)\)`. * Mean `\(E[y_t] = \mu\)` * Variance `\(var[y_t ] = (1+ \theta^2) \sigma^2\)` * Autocovariance: `\(cov[y_t, y_{t-1}] = \theta \sigma_y^2\)`, and `\(cov[y_t, y_{t-h}] = 0\)` for `\(h\geq2\)`. * MA(1) must be stationary regardless the value of `\(\theta\)` ```r y <- arima.sim(model = list(ma = 0.9), n = n) plot(y) ``` ![](ts_slides4_files/figure-html/unnamed-chunk-10-1.png)<!-- --> ```r acf(y) ``` ![](ts_slides4_files/figure-html/unnamed-chunk-10-2.png)<!-- --> ## MA(q) * Generative model: `$$y_t = \mu + \epsilon_t + \theta_1 \epsilon_{t-1} + \cdots + \theta_q \epsilon_{t-q}$$` * Mean `\(E[y_t] = \mu\)` * Variance `\(var[y_t ] = (1+ \theta_1^2 + \cdots + \theta^2_q) \sigma^2\)` * Limited memory: `\(cov[y_t, y_{t-h}] = 0\)` for all `\(h > q\)`. * MA(q) must be stationary * Estimation is based iteration. No closed form solution available in general. ## ARMA(p, q) * General model $$ y_t - \beta_1 y_{t-1} - \cdots - \beta_p y_{t-p} = \mu + \theta_1 \epsilon_{t-1} + \cdots + \theta_q \epsilon_{t-q} + \epsilon_t $$ * Stationarity condition is the same as AR(p), as the MA(q) part is always stationary. * Estimation is based iteration as well, due to the MA component. ## Information criteria * Determine `\(p\)` and `\(q\)` * Akaika: `\(AIC = \log( \hat {\sigma}^2) + \frac{2}{T}(p+q)\)` * Bayesian (Schwarz) `\(BIC = \log( \hat {\sigma}^2) + \frac{\log T}{T}(p+q)\)` * Hannan: `\(HIC= \log( \hat {\sigma}^2) + \frac{2\log \log T}{T}(p+q)\)` ```r y <- arima.sim(model = list(ar = 0.5, ma = 0.5), n = 100) # AIC arima(y, order = c(1, 0, 0)) %>% AIC() %>% print() ``` ``` ## [1] 283.1934 ``` ```r arima(y, order = c(1, 0, 1)) %>% AIC() %>% print() ``` ``` ## [1] 273.5499 ``` ```r # BIC arima(y, order = c(1, 0, 0)) %>% BIC() %>% print() ``` ``` ## [1] 291.0089 ``` ```r arima(y, order = c(1, 0, 1)) %>% BIC() %>% print() ``` ``` ## [1] 283.9706 ```