GARCH Models

class: center, middle, inverse, title-slide

.title[
# GARCH Models
]
.author[
### Zhentao Shi
]
.date[
### Nov 8, 2021
]

---

## Variance and volatility

* To quantify uncertainty, variance must be evaluated
  * Examples: interval forecast, VaR, ...
  
* In financial econometrics, applications include
  * Portfolio management
  * Pricing of options
  * ...
  
* Mean is difficult to predict
* Variance is easier to predict
  * Unconditional vs. conditional variance

---

## Real data example

* S&P500 returns
* Raw data: daily, 1960--2017
* Here we use data after 2015-01-01

```r
d0 <- read.csv("SPX.csv", header = TRUE)
d0$datevec <- as.Date(d0$datevec, format = "%d/%m/%Y")
d0$r <- c(NA, diff(log(d0$SP500)) )
d0 <- d0[-1,]

d1 <- d0[d0$datevec > "2015-01-01", ]

plot(y = d1$r, x = d1$datevec,
     xlim = as.Date( c("2015-01-01", "2017-01-01") ), 
     type = "l")
```

![](ts_slides9_files/figure-html/unnamed-chunk-1-1.png)

---
## Real data example (continue)

* Square of returns

* Dynamic patterns

* Volatility clustering

```r
plot( y = d1$r^2, x = d1$datevec,
      xlim = as.Date( c("2015-01-01", "2017-01-01") ), type = "l")
abline(h = mean(d1$r^2), col = "blue", lty = 2)
abline(h = 0, col = "red", lty = 2)
```

![](ts_slides9_files/figure-html/unnamed-chunk-2-1.png)

```r
plot( y = d1$r^2, x = d1$datevec, 
      ylim = c(0, 2*mean(d1$r^2)), 
      xlim = as.Date( c("2016-01-01", "2017-01-01") ), 
      type = "l")
abline(h = mean(d1$r^2), col = "blue", lty = 2)
```

![](ts_slides9_files/figure-html/unnamed-chunk-2-2.png)

---

## Real data example (continue)

* Naively fit an AR(1) model for  `$r_t$`

```r
arima(d1$r, order = c(1,0,0))
```

```
## 
## Call:
## arima(x = d1$r, order = c(1, 0, 0))
## 
## Coefficients:
##           ar1  intercept
##       -0.0094      3e-04
## s.e.   0.0416      4e-04
## 
## sigma^2 estimated as 7.27e-05:  log likelihood = 1927.09,  aic = -3848.18
```

* Naively fit an AR(1) model for `$r_t^2$`

```r
arima(d1$r^2, order = c(1,0,0))
```

```
## 
## Call:
## arima(x = d1$r^2, order = c(1, 0, 0))
## 
## Coefficients:
##          ar1  intercept
##       0.3058      1e-04
## s.e.  0.0396      0e+00
## 
## sigma^2 estimated as 2.257e-08:  log likelihood = 4253.34,  aic = -8500.68
```

---

## Definitions

A time series `$(y_t)$` is called

* **a martingale difference sequence** if `$E_{t-1} [y_t ] = 0$`
* **white noise** if 
  * `$E [y_t ] = 0$`, 
  * `$E[y_t^2] = \sigma_y^2$`, and 
  * `$E[y_t, y_{t-k}] = 0$` for all `$k$`

---

## Conditional volatility model

* For simplicity, let

`$$r_t = \sqrt{h_t} \epsilon_t,$$`

where

* `$\epsilon_t \sim iid N(0,1)$`
* `$h_t = E_{t-1}[r_t^2]$` is the *conditional variance*

* Suppose `$h_t$` is adaptive to past information up to `$(t-1)$`. In other words, `$E_{t-1}[h_t]=h_t$`
  * For example, if for some `$e_t \sim iid N(0,1)$` independent from all other variables we have `$h_t = 0.5+0.5 e_{t-1}^2$`, then `$h_t$` is strictly stationary, and  `$E[h_t] = 1$`
  
---

```r
N = 100
e = rnorm(N)

h = 0.5 + 0.5 * c(0,e)^2 # at a `0` to the head of `e` to reflect `lag`
h = h[-1]

plot(h, type = "l", main = "conditional variance")
abline(h = 0.5, col = "red")
```

![](ts_slides9_files/figure-html/unnamed-chunk-5-1.png)

* In this example, `$(r_t)$` is strictly stationary
* Any function of `$(r_t, r_{t-1}, \ldots, r_{-\infty})$` is still strictly stationary

```r
r <- h * rnorm(N)
plot(r, type = "l", main = "the series of returns")
abline(h = 0, col = "red")
```

![](ts_slides9_files/figure-html/unnamed-chunk-6-1.png)

---

## Mean and variance

* Conditional mean `$E_{t-1}[r_t] = E_{t-1}[\sqrt{h_t} \epsilon_t] = \sqrt{h_t} \times 0 = 0$` (martingale difference sequence)

* Unconditional mean `$E[r_t] =   E[ E_{t-1}[r_t] ] = 0$`

* Conditional variance

$$
`\begin{align}
E_{t-1}[r_t^2] & = E_{t-1}[ h_t \epsilon^2_t ] =  h_t  E_{t-1}[\epsilon^2_t ]   = h_t \times 1 = h_t
\end{align}`
$$

* Unconditional variance

$$
`\begin{align}
E[r_t^2] & = E[ E_{t-1}[ r^2_t ] ]  = E[ h_t ] 
\end{align}`
$$

* Unconditional covariance: for `$k>0$`, we have

$$
`\begin{align}
E[r_t r_{t-k}] & = E[\sqrt{h_t h_{t-k}} \epsilon_t \epsilon_{t-k}] \\
& = E[E_{t-1}[\sqrt{h_t h_{t-k}} \epsilon_t \epsilon_{t-k}] ] \\
& = E[ \sqrt{h_t h_{t-k}} \epsilon_{t-k} E_{t-1}[ \epsilon_t ] ] \\
& = E[ \sqrt{h_t h_{t-k}} \epsilon_{t-k} \times 0 ] \\
& = 0 
\end{align}`
$$

* If `$E[h_t]$` is a constant, `$(r_t)$` is a white noise with time-varying *conditional heteroskedasticity*.

---

## Simple volatility models

* Historical model

$$
h_t = \frac{1}{M} \sum_{i=1}^M r_{t-i}^2
$$

* Common choices for `$M$` can be 22 (monthly average) or 250 (yearly average)

* Exponentially weighted moving average (EWMA)

$$
h_t = (1-\lambda) \sum_{j=0}^{\infty} \lambda^i r_{t-i-1}^2
$$

* The coefficient `$\lambda$` can be either estimated or imposed
  
---

## ARCH

* Autoregressive conditional heteroskedastic model (Engle, 1982)

* ARCH(q): Models `$h_t$` as *autoregressive* on past `$(r_t, r_{t-1}, \ldots, r_{t-q})$`

$$
h_t = \alpha_0 + \sum_{i=1}^q \alpha_i  r_{t-i}^2 
$$

* ARCH(1): `$h_t = \alpha_0 + \alpha_1  r_{t-1}^2$`
   * Analogy with AR(1): let `$y_t = r_t^2$`, and then `$E_{t-1} [y_t] = \alpha_0 + \alpha_1  y_{t-1}$`
   * ARCH(1) can be interpreted as an AR(1) in terms of `$r_t^2$`
   * Take unconditional expectation on both sides, we solve `$E[h_t] = \alpha_0 / (1-\alpha_1)$`

## GARCH

* To make ARCH more flexible, Bollerslev (1986) proposes the *generalized ARCH* (GARCH)

`$$h_t = \alpha_0 + \sum_{i=1}^q \alpha_i  r_{t-i}^2 + \sum_{i=1}^p \beta_i h_{t-i}$$`

* The special case GARCH(1,1)

$$
h_t = \alpha_0 + \alpha_1  r_{t-1}^2 + \beta_1 h_{t-1}
$$

is the leading model in practice

## ARCH vs GARCH

* To understand how GARCH generalizes ARCH, write ARCH(1) as

$$
`\begin{align}
h_t & = \alpha_0 + \alpha_1  r_{t-1}^2  \\
& = \alpha_0 + \alpha_1 ( r_{t-1}^2 - E_{t-2}[r^2_{t-1}]) + \alpha_1  E_{t-2}[r^2_{t-1}] \\
& = \alpha_0 + \alpha_1 ( r_{t-1}^2 - h_{t-1}) + \alpha_1 h_{t-1}
\end{align}`
$$

* Notice `$h_{t-1}$` is the conditional mean of `$r_{t-1}^2$`, and `$(r_{t-1}^2 - h_{t-1})$` is the demeaned `$r_{t-1}^2$`
* ARCH entails that these two terms share the same coefficient
* GARCH allows distinct coefficients

## IGARCH

* In GARCH(1,1), if `$\alpha_1 + \beta_1 = 1$`, then

$$
`\begin{align}
h_t & = \alpha_0 + \alpha_1  r_{t-1}^2  + \beta_1 h_t \\
& = \alpha_0 + \alpha_1 ( r_{t-1}^2 - h_{t-1}) + (\alpha_1 + \beta_1) h_{t-1}  \\
& = \alpha_0 + \alpha_1 ( r_{t-1}^2 - h_{t-1}) +  h_{t-1} 
\end{align}`
$$

* Take conditional expectation up to time `$(t-2)$`, we have

$$
`\begin{align}
h_{t|t-2} &  = E_{t-2}[ \alpha_0 + \alpha_1 ( r_{t-1}^2 - h_{t-1}) +  h_{t-1}  ] \\
 & = \alpha_0 + 0 +  h_{t-1} \\
 & = \alpha_0 + h_{t-1}
\end{align}`
$$

* `$(h_t)$` behaves similar to unit root (Remind that a unit root process with a drift satisfies `$E_{t-1}[y_t] = \alpha_0 + y_{t-1}$`)
* Integrated-GARCH (IGARCH)

* Empirically, `$\hat{\alpha}_1 + \hat{\beta}_1$` is often close to 1

## Additional explanatory regressors

* Regressors can be added to better fit the mean and the variance

* Return `$r_{\mu,x,t} = \mu_0 + \sum_{k=1}^K \gamma_k x_{kt}  + u_t$`
  
* Variance `$h_t = \alpha_0 + \sum_{i=1}^q \alpha_i  r_{t-i}^2 + \sum_{i=1}^p \beta_i  h_{t-i} + \sum_{k=1}^K \psi_k x_{kt}$`
  
* Distribution `$r_t \sim N(0, h_t)$`

## Estimation

* GARCH is a nonlinear model
* Maximum likelihood estimation
* Specification: `$r_t | \mbox{past history} \sim N(0, h_t)$`

* Observed time series `$(r_{\mu, t})_{t=1}^T$`, where we allow an unknown mean in  `$r_{\mu, t}= \mu_0 + r_t$`
* Unknown parameters `$\theta = (\mu, (\alpha_i)_{i=0}^q,  (\beta_i)_{i=1}^p )$`

## Likelihood

* Density

$$
f(r_{\mu, t} | r_{\mu, t-1}, r_{\mu, t-2},\ldots;\theta) = \frac{1}{\sqrt{2\pi h_t}} \exp\left(-\frac{(r_{\mu, t}-\mu)^2}{2h_t}\right)
$$

* Log-likelihood

$$
`\begin{align}
\ell_t(\theta) & = \log f(r_{\mu, t} | r_{\mu, t-1}, r_{\mu, t-2},\ldots;\theta) \\
& = -\frac{1}{2}\log 2\pi -\frac{1}{2}\log h_t -\frac{(r_t-\mu)^2}{2h_t}
\end{align}`
$$

where `$h_t = \alpha_0 + \sum_{i=1}^q \alpha_i r_{t-i}^2 + \sum_{i=1}^p \beta_i h_{t-i}$`

* The dynamic system requires initial value to generate 
`$h_1 = \alpha_0 + \alpha_1 r_0^2 + \beta_1 h_0$`
  * Often time,  we set `$r_0=0$` and `$h_0$` as the unconditional sample variance

* Given the full sample, maximum likelihood estimation seeks to

$$
\max_{\theta} \sum_{t=1}^T \log \ell_t(\theta)
$$

* If heavy tail is a concern, we specify the `$r_t$` with a `$t$` distribution
  * The degree of freedom of the `$t$` distribution `$\nu \in (2,\infty)$` is an additional parameter to be estimated
  * AIC can be useful to determine the model specification

## Real data example

* R package `fGarch`
  * Developed by [Rmetrics](https://www.rmetrics.org/)

```r
gar <- garchFit(formula = ~ garch(1,1), data = d1$r, cond.dist = c("norm"))
```

```
## 
## Series Initialization:
##  ARMA Model:                arma
##  Formula Mean:              ~ arma(0, 0)
##  GARCH Model:               garch
##  Formula Variance:          ~ garch(1, 1)
##  ARMA Order:                0 0
##  Max ARMA Order:            0
##  GARCH Order:               1 1
##  Max GARCH Order:           1
##  Maximum Order:             1
##  Conditional Dist:          norm
##  h.start:                   2
##  llh.start:                 1
##  Length of Series:          576
##  Recursion Init:            mci
##  Series Scale:              0.008534223
## 
## Parameter Initialization:
##  Initial Parameters:          $params
##  Limits of Transformations:   $U, $V
##  Which Parameters are Fixed?  $includes
##  Parameter Matrix:
##                      U           V     params includes
##     mu     -0.29301831   0.2930183 0.02930183     TRUE
##     omega   0.00000100 100.0000000 0.10000000     TRUE
##     alpha1  0.00000001   1.0000000 0.10000000     TRUE
##     gamma1 -0.99999999   1.0000000 0.10000000    FALSE
##     beta1   0.00000001   1.0000000 0.80000000     TRUE
##     delta   0.00000000   2.0000000 2.00000000    FALSE
##     skew    0.10000000  10.0000000 1.00000000    FALSE
##     shape   1.00000000  10.0000000 4.00000000    FALSE
##  Index List of Parameters to be Optimized:
##     mu  omega alpha1  beta1 
##      1      2      3      5 
##  Persistence:                  0.9 
## 
## 
## --- START OF TRACE ---
## Selected Algorithm: nlminb 
## 
## R coded nlminb Solver: 
## 
##   0:     762.16113: 0.0293018 0.100000 0.100000 0.800000
##   1:     759.57018: 0.0293031 0.0788547 0.0992305 0.785324
##   2:     756.70937: 0.0293060 0.0786691 0.124553 0.789998
##   3:     756.63443: 0.0293121 0.0591280 0.137720 0.779615
##   4:     755.45163: 0.0293156 0.0694988 0.144753 0.782574
##   5:     755.11502: 0.0293278 0.0716495 0.148522 0.770459
##   6:     754.84923: 0.0293825 0.0771521 0.169095 0.756102
##   7:     754.65299: 0.0297599 0.0937332 0.181315 0.710690
##   8:     754.46118: 0.0298784 0.0926112 0.187934 0.716471
##   9:     754.44556: 0.0299193 0.0883728 0.189186 0.716695
##  10:     754.42417: 0.0300197 0.0885991 0.191146 0.719122
##  11:     754.41693: 0.0301539 0.0879314 0.191044 0.718804
##  12:     754.36809: 0.0323237 0.0912530 0.189018 0.715188
##  13:     754.06029: 0.0449843 0.0887579 0.194768 0.717037
##  14:     753.90002: 0.0576412 0.0917023 0.200796 0.706803
##  15:     753.88511: 0.0604351 0.0891227 0.195525 0.713504
##  16:     753.87586: 0.0632386 0.0896136 0.198517 0.710881
##  17:     753.87506: 0.0643934 0.0898699 0.199565 0.709794
##  18:     753.87506: 0.0643908 0.0898665 0.199589 0.709790
##  19:     753.87506: 0.0643904 0.0898671 0.199589 0.709789
## 
## Final Estimate of the Negative LLH:
##  LLH:  -1989.999    norm LLH:  -3.45486 
##           mu        omega       alpha1        beta1 
## 5.495216e-04 6.545287e-06 1.995887e-01 7.097889e-01 
## 
## R-optimhess Difference Approximated Hessian Matrix:
##                   mu         omega        alpha1         beta1
## mu     -1.307989e+07 -2.443104e+08      5204.893    -11956.368
## omega  -2.443104e+08 -1.702545e+12 -42924455.368 -76179748.158
## alpha1  5.204893e+03 -4.292446e+07     -2516.685     -2876.809
## beta1  -1.195637e+04 -7.617975e+07     -2876.809     -4397.969
## attr(,"time")
## Time difference of 0.01220894 secs
## 
## --- END OF TRACE ---
## 
## 
## Time to Estimate Parameters:
##  Time difference of 0.04920602 secs
```

```
## Warning: Using formula(x) is deprecated when x is a character vector of length > 1.
##   Consider formula(paste(x, collapse = " ")) instead.
```

## Real data example (continue)

* `fGarch` notation:

$$
`\begin{align}
r_{\mu, t} & = \mu + r_t \\
h_t & = \omega + \alpha_1 r_{t-1}^2 + \beta_1 h_{t-1}
\end{align}`
$$

```r
print(gar)
```

```
## 
## Title:
##  GARCH Modelling 
## 
## Call:
##  garchFit(formula = ~garch(1, 1), data = d1$r, cond.dist = c("norm")) 
## 
## Mean and Variance Equation:
##  data ~ garch(1, 1)
## <environment: 0x00000000213c2820>
##  [data = d1$r]
## 
## Conditional Distribution:
##  norm 
## 
## Coefficient(s):
##         mu       omega      alpha1       beta1  
## 5.4952e-04  6.5453e-06  1.9959e-01  7.0979e-01  
## 
## Std. Errors:
##  based on Hessian 
## 
## Error Analysis:
##         Estimate  Std. Error  t value Pr(>|t|)    
## mu     5.495e-04   2.801e-04    1.962 0.049789 *  
## omega  6.545e-06   1.805e-06    3.627 0.000287 ***
## alpha1 1.996e-01   4.478e-02    4.457 8.31e-06 ***
## beta1  7.098e-01   5.374e-02   13.208  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log Likelihood:
##  1989.999    normalized:  3.45486 
## 
## Description:
##  Wed Jun 08 23:17:27 2022 by user: zhent
```

## Asymmetric effects

* GARCH implies that shocks, positive and negative, have symmetric effects
* Theory and empirical evidence support asymmetry
  * Negative shocks have a larger effect

* TARCH(1,1) ("T" for *threshold*)

$$
h_t = \alpha_0 + [\alpha_1 + \lambda \mathbb{I}(r_{t-1}>0)] r_{t-1}^2 +  \beta_1 h_{t-1}
$$

## Volatility Forecast

* Consider GARCH(1,1):

$$
h_{t+1} = \alpha_0 + \alpha_1 r_t^2 + \beta_1 h_t
$$

* Given information up to `$T$`, the one-step-ahead forecast

$$
`\begin{align}
h_{T+1|T} & = \alpha_0 + \alpha_1 r_T^2 + \beta_1 h_T  \\
h_{T+2|T} & = \alpha_0 + \alpha_1 E_T[r_{T+1}^2] + \beta_1 h_{T+1|T} = \alpha_0 + (\alpha_1 + \beta_1) h_{T+1|T} \\
\cdots \\
h_{T+k|T} & = \alpha_0 + \alpha_1 E_T[r_{T+k-1}^2] + \beta_1 h_{T+k-1|T} = \alpha_0 + (\alpha_1 + \beta_1) h_{T+k-1|T}
\end{align}`
$$

* In practice, use estimated coefficients to replace `$\alpha_0$`, `$\alpha_1$` and `$\beta_1$`

* Long-run average variance

$$
\lim_{k\to \infty} h_{T+k|T} =  \frac{\alpha_0}{1-\alpha_1 - \beta_1}
$$

* GARCH(1,1) is difficult to beat in terms of forecast (Hansen and Lunde, 2005)

## Accumulated variance

* Future `$k$`-period (cumulative) log return is

$$
r_{T+1}(k) = r_{T+1} + r_{T+2} + \cdots + r_{T+k}
$$

* Take conditional expectation on both sides:

$$
E_T [r^2_{T+1}(k)] = h_{T+1|T} + h_{T+2|T} + \cdots + h_{T+k|T}
$$

## Real data example (continue)

```r
predict(gar, n.ahead = 5, plot = TRUE, mse = "cond", nx = 20)
```

![](ts_slides9_files/figure-html/unnamed-chunk-9-1.png)

```
##   meanForecast   meanError standardDeviation lowerInterval upperInterval
## 1 0.0005495216 0.005722513       0.005722513   -0.01066640    0.01176544
## 2 0.0005495216 0.006027007       0.006027007   -0.01126320    0.01236224
## 3 0.0005495216 0.006291125       0.006291125   -0.01178086    0.01287990
## 4 0.0005495216 0.006522029       0.006522029   -0.01223342    0.01333246
## 5 0.0005495216 0.006725129       0.006725129   -0.01263149    0.01373053
```

* Notice the predicted mean and the sd

## Forecast evaluation

* Issue: `$h_t = E_{t-1}[r_t^2]$` cannot observed

* Solution: Use `$r_t^2$` as a proxy 
  * HMPY Ch.15: realized variance

* Compute the forecast metrics such as RMSE, MAE, ...

* In addition, motivated from the likelihood estimation, a popular choice of the loss function is called the *quasi-likelihood loss*

$$
QLIKE = \log \hat{h}_t + \frac{(r_{\mu, t} - \hat{\mu}) ^2}{\hat{h}_t}
$$

* Risk is the average of the loss

## Test evaluation

* Test the unbiasedness of a forecast

* Specify a regression form

$$
r_t^2 = \delta_0 + \delta_1 \hat {h}_t + e_t
$$

and test the null H0: `$\delta_0 = 0$` and `$\delta_1 = 1$`

* The problem of this approach is its sensitivity to large shocks

* Alternative specifications:

$$
`\begin{align}
| r_t | &  = \delta_0 + \delta_1 \sqrt{\hat {h}_t} + e_t \\
\log r_t^2  & = \delta_0 + \delta_1 \log \hat {h}_t + e_t
\end{align}`
$$

## Risk-return trade-off

* Augment CAPM model to take into account the compensation for risk

* GARCH-M (GARCH-in-mean) model:

$$
`\begin{align}
r_{it} - r_{ft} & = \mu_0 + \mu_1 h_t^{\omega} + \mu_2 (r_{mt} - r_{ft}) + u_t \\ 
u_t & \sim N(0,h_t) \\
h_t & = \alpha_0 + \alpha_1 u_{t-1}^2 + \beta_1 h_{t-1}
\end{align}`
$$

* Risk-return  tradeoff is represented by `$\mu_1 h_t^{\omega}$`, where
the usual choice of `$\omega$` is `$1$` or `$0.5$`. 
  * AIC can be used to decide the choice of the value.

* Test: H0 `$\mu_1 = 0$`

## Multivariate GARCH

* Similar to the extension from AR to VAR

* Consider returns of `$N$` assets `$r_t = (r_{1t}, r_{2t}, \ldots, r_{Nt})'$`

$$
`\begin{align}
r_t & = E_{t-1}[r_t] + u_t \\
H_t & = E_{t-1}[u_t u_t^{\prime}]
\end{align}`
$$

* `$H_t$` is an `$N\times N$` *time-varying conditional heteroskedastic variance-covariance matrix*
  * Conditional variance `$h_{iit}$` on the diagonal line
  * Conditional covariance `$h_{ijt}$`, `$i\neq j$`, on the off-diagonal elements

* Examples:
  * Time-varying beta coefficient in CAPM
  * Time-varying covariance in portfolio optimization
  
## BEKK model

* Baba, Engle, Kraft and Kroner (1990)

$$
H_t = C C' + A u_{t-1} u_{t-1}' A' + B H_{t-1} B'
$$

* GARCH(1,1) is BEKK's special case when `$N=1$`

* Estimation method: 
  * Model `$r_t \sim N(0, H_t)$`
  * Maximum likelihood
  
* Drawback: when `$N$` is large, too many parameters to estimate as `$C$`, `$A$` and `$B$` are `$N\times N$` matrices

## Summary

* Practice relevance of conditional variance
* GARCH process
* Variants: IGARCH, TARCH, ...
* Estimation
* Forecast
* Multivariate GARCH