class: center, middle, inverse, title-slide .title[ # GARCH Models ] .author[ ### Zhentao Shi ] .date[ ### Nov 8, 2021 ] --- --- ## Variance and volatility * To quantify uncertainty, variance must be evaluated * Examples: interval forecast, VaR, ... * In financial econometrics, applications include * Portfolio management * Pricing of options * ... * Mean is difficult to predict * Variance is easier to predict * Unconditional vs. conditional variance --- ## Real data example * S&P500 returns * Raw data: daily, 1960--2017 * Here we use data after 2015-01-01 ```r d0 <- read.csv("SPX.csv", header = TRUE) d0$datevec <- as.Date(d0$datevec, format = "%d/%m/%Y") d0$r <- c(NA, diff(log(d0$SP500)) ) d0 <- d0[-1,] d1 <- d0[d0$datevec > "2015-01-01", ] plot(y = d1$r, x = d1$datevec, xlim = as.Date( c("2015-01-01", "2017-01-01") ), type = "l") ``` ![](ts_slides9_files/figure-html/unnamed-chunk-1-1.png)<!-- --> --- ## Real data example (continue) * Square of returns * Dynamic patterns * Volatility clustering ```r plot( y = d1$r^2, x = d1$datevec, xlim = as.Date( c("2015-01-01", "2017-01-01") ), type = "l") abline(h = mean(d1$r^2), col = "blue", lty = 2) abline(h = 0, col = "red", lty = 2) ``` ![](ts_slides9_files/figure-html/unnamed-chunk-2-1.png)<!-- --> ```r plot( y = d1$r^2, x = d1$datevec, ylim = c(0, 2*mean(d1$r^2)), xlim = as.Date( c("2016-01-01", "2017-01-01") ), type = "l") abline(h = mean(d1$r^2), col = "blue", lty = 2) ``` ![](ts_slides9_files/figure-html/unnamed-chunk-2-2.png)<!-- --> --- ## Real data example (continue) * Naively fit an AR(1) model for `\(r_t\)` ```r arima(d1$r, order = c(1,0,0)) ``` ``` ## ## Call: ## arima(x = d1$r, order = c(1, 0, 0)) ## ## Coefficients: ## ar1 intercept ## -0.0094 3e-04 ## s.e. 0.0416 4e-04 ## ## sigma^2 estimated as 7.27e-05: log likelihood = 1927.09, aic = -3848.18 ``` * Naively fit an AR(1) model for `\(r_t^2\)` ```r arima(d1$r^2, order = c(1,0,0)) ``` ``` ## ## Call: ## arima(x = d1$r^2, order = c(1, 0, 0)) ## ## Coefficients: ## ar1 intercept ## 0.3058 1e-04 ## s.e. 0.0396 0e+00 ## ## sigma^2 estimated as 2.257e-08: log likelihood = 4253.34, aic = -8500.68 ``` --- ## Definitions A time series `\((y_t)\)` is called * **a martingale difference sequence** if `\(E_{t-1} [y_t ] = 0\)` * **white noise** if * `\(E [y_t ] = 0\)`, * `\(E[y_t^2] = \sigma_y^2\)`, and * `\(E[y_t, y_{t-k}] = 0\)` for all `\(k\)` --- ## Conditional volatility model * For simplicity, let `$$r_t = \sqrt{h_t} \epsilon_t,$$` where * `\(\epsilon_t \sim iid N(0,1)\)` * `\(h_t = E_{t-1}[r_t^2]\)` is the *conditional variance* * Suppose `\(h_t\)` is adaptive to past information up to `\((t-1)\)`. In other words, `\(E_{t-1}[h_t]=h_t\)` * For example, if for some `\(e_t \sim iid N(0,1)\)` independent from all other variables we have `\(h_t = 0.5+0.5 e_{t-1}^2\)`, then `\(h_t\)` is strictly stationary, and `\(E[h_t] = 1\)` --- ```r N = 100 e = rnorm(N) h = 0.5 + 0.5 * c(0,e)^2 # at a `0` to the head of `e` to reflect `lag` h = h[-1] plot(h, type = "l", main = "conditional variance") abline(h = 0.5, col = "red") ``` ![](ts_slides9_files/figure-html/unnamed-chunk-5-1.png)<!-- --> * In this example, `\((r_t)\)` is strictly stationary * Any function of `\((r_t, r_{t-1}, \ldots, r_{-\infty})\)` is still strictly stationary ```r r <- h * rnorm(N) plot(r, type = "l", main = "the series of returns") abline(h = 0, col = "red") ``` ![](ts_slides9_files/figure-html/unnamed-chunk-6-1.png)<!-- --> --- ## Mean and variance * Conditional mean `\(E_{t-1}[r_t] = E_{t-1}[\sqrt{h_t} \epsilon_t] = \sqrt{h_t} \times 0 = 0\)` (martingale difference sequence) * Unconditional mean `\(E[r_t] = E[ E_{t-1}[r_t] ] = 0\)` * Conditional variance $$ `\begin{align} E_{t-1}[r_t^2] & = E_{t-1}[ h_t \epsilon^2_t ] = h_t E_{t-1}[\epsilon^2_t ] = h_t \times 1 = h_t \end{align}` $$ * Unconditional variance $$ `\begin{align} E[r_t^2] & = E[ E_{t-1}[ r^2_t ] ] = E[ h_t ] \end{align}` $$ * Unconditional covariance: for `\(k>0\)`, we have $$ `\begin{align} E[r_t r_{t-k}] & = E[\sqrt{h_t h_{t-k}} \epsilon_t \epsilon_{t-k}] \\ & = E[E_{t-1}[\sqrt{h_t h_{t-k}} \epsilon_t \epsilon_{t-k}] ] \\ & = E[ \sqrt{h_t h_{t-k}} \epsilon_{t-k} E_{t-1}[ \epsilon_t ] ] \\ & = E[ \sqrt{h_t h_{t-k}} \epsilon_{t-k} \times 0 ] \\ & = 0 \end{align}` $$ * If `\(E[h_t]\)` is a constant, `\((r_t)\)` is a white noise with time-varying *conditional heteroskedasticity*. --- ## Simple volatility models * Historical model $$ h_t = \frac{1}{M} \sum_{i=1}^M r_{t-i}^2 $$ * Common choices for `\(M\)` can be 22 (monthly average) or 250 (yearly average) * Exponentially weighted moving average (EWMA) $$ h_t = (1-\lambda) \sum_{j=0}^{\infty} \lambda^i r_{t-i-1}^2 $$ * The coefficient `\(\lambda\)` can be either estimated or imposed --- ## ARCH * Autoregressive conditional heteroskedastic model (Engle, 1982) * ARCH(q): Models `\(h_t\)` as *autoregressive* on past `\((r_t, r_{t-1}, \ldots, r_{t-q})\)` $$ h_t = \alpha_0 + \sum_{i=1}^q \alpha_i r_{t-i}^2 $$ * ARCH(1): `\(h_t = \alpha_0 + \alpha_1 r_{t-1}^2\)` * Analogy with AR(1): let `\(y_t = r_t^2\)`, and then `\(E_{t-1} [y_t] = \alpha_0 + \alpha_1 y_{t-1}\)` * ARCH(1) can be interpreted as an AR(1) in terms of `\(r_t^2\)` * Take unconditional expectation on both sides, we solve `\(E[h_t] = \alpha_0 / (1-\alpha_1)\)` ## GARCH * To make ARCH more flexible, Bollerslev (1986) proposes the *generalized ARCH* (GARCH) `$$h_t = \alpha_0 + \sum_{i=1}^q \alpha_i r_{t-i}^2 + \sum_{i=1}^p \beta_i h_{t-i}$$` * The special case GARCH(1,1) $$ h_t = \alpha_0 + \alpha_1 r_{t-1}^2 + \beta_1 h_{t-1} $$ is the leading model in practice ## ARCH vs GARCH * To understand how GARCH generalizes ARCH, write ARCH(1) as $$ `\begin{align} h_t & = \alpha_0 + \alpha_1 r_{t-1}^2 \\ & = \alpha_0 + \alpha_1 ( r_{t-1}^2 - E_{t-2}[r^2_{t-1}]) + \alpha_1 E_{t-2}[r^2_{t-1}] \\ & = \alpha_0 + \alpha_1 ( r_{t-1}^2 - h_{t-1}) + \alpha_1 h_{t-1} \end{align}` $$ * Notice `\(h_{t-1}\)` is the conditional mean of `\(r_{t-1}^2\)`, and `\((r_{t-1}^2 - h_{t-1})\)` is the demeaned `\(r_{t-1}^2\)` * ARCH entails that these two terms share the same coefficient * GARCH allows distinct coefficients ## IGARCH * In GARCH(1,1), if `\(\alpha_1 + \beta_1 = 1\)`, then $$ `\begin{align} h_t & = \alpha_0 + \alpha_1 r_{t-1}^2 + \beta_1 h_t \\ & = \alpha_0 + \alpha_1 ( r_{t-1}^2 - h_{t-1}) + (\alpha_1 + \beta_1) h_{t-1} \\ & = \alpha_0 + \alpha_1 ( r_{t-1}^2 - h_{t-1}) + h_{t-1} \end{align}` $$ * Take conditional expectation up to time `\((t-2)\)`, we have $$ `\begin{align} h_{t|t-2} & = E_{t-2}[ \alpha_0 + \alpha_1 ( r_{t-1}^2 - h_{t-1}) + h_{t-1} ] \\ & = \alpha_0 + 0 + h_{t-1} \\ & = \alpha_0 + h_{t-1} \end{align}` $$ * `\((h_t)\)` behaves similar to unit root (Remind that a unit root process with a drift satisfies `\(E_{t-1}[y_t] = \alpha_0 + y_{t-1}\)`) * Integrated-GARCH (IGARCH) * Empirically, `\(\hat{\alpha}_1 + \hat{\beta}_1\)` is often close to 1 ## Additional explanatory regressors * Regressors can be added to better fit the mean and the variance * Return `\(r_{\mu,x,t} = \mu_0 + \sum_{k=1}^K \gamma_k x_{kt} + u_t\)` * Variance `\(h_t = \alpha_0 + \sum_{i=1}^q \alpha_i r_{t-i}^2 + \sum_{i=1}^p \beta_i h_{t-i} + \sum_{k=1}^K \psi_k x_{kt}\)` * Distribution `\(r_t \sim N(0, h_t)\)` ## Estimation * GARCH is a nonlinear model * Maximum likelihood estimation * Specification: `\(r_t | \mbox{past history} \sim N(0, h_t)\)` * Observed time series `\((r_{\mu, t})_{t=1}^T\)`, where we allow an unknown mean in `\(r_{\mu, t}= \mu_0 + r_t\)` * Unknown parameters `\(\theta = (\mu, (\alpha_i)_{i=0}^q, (\beta_i)_{i=1}^p )\)` ## Likelihood * Density $$ f(r_{\mu, t} | r_{\mu, t-1}, r_{\mu, t-2},\ldots;\theta) = \frac{1}{\sqrt{2\pi h_t}} \exp\left(-\frac{(r_{\mu, t}-\mu)^2}{2h_t}\right) $$ * Log-likelihood $$ `\begin{align} \ell_t(\theta) & = \log f(r_{\mu, t} | r_{\mu, t-1}, r_{\mu, t-2},\ldots;\theta) \\ & = -\frac{1}{2}\log 2\pi -\frac{1}{2}\log h_t -\frac{(r_t-\mu)^2}{2h_t} \end{align}` $$ where `\(h_t = \alpha_0 + \sum_{i=1}^q \alpha_i r_{t-i}^2 + \sum_{i=1}^p \beta_i h_{t-i}\)` * The dynamic system requires initial value to generate `\(h_1 = \alpha_0 + \alpha_1 r_0^2 + \beta_1 h_0\)` * Often time, we set `\(r_0=0\)` and `\(h_0\)` as the unconditional sample variance * Given the full sample, maximum likelihood estimation seeks to $$ \max_{\theta} \sum_{t=1}^T \log \ell_t(\theta) $$ * If heavy tail is a concern, we specify the `\(r_t\)` with a `\(t\)` distribution * The degree of freedom of the `\(t\)` distribution `\(\nu \in (2,\infty)\)` is an additional parameter to be estimated * AIC can be useful to determine the model specification ## Real data example * R package `fGarch` * Developed by [Rmetrics](https://www.rmetrics.org/) ```r gar <- garchFit(formula = ~ garch(1,1), data = d1$r, cond.dist = c("norm")) ``` ``` ## ## Series Initialization: ## ARMA Model: arma ## Formula Mean: ~ arma(0, 0) ## GARCH Model: garch ## Formula Variance: ~ garch(1, 1) ## ARMA Order: 0 0 ## Max ARMA Order: 0 ## GARCH Order: 1 1 ## Max GARCH Order: 1 ## Maximum Order: 1 ## Conditional Dist: norm ## h.start: 2 ## llh.start: 1 ## Length of Series: 576 ## Recursion Init: mci ## Series Scale: 0.008534223 ## ## Parameter Initialization: ## Initial Parameters: $params ## Limits of Transformations: $U, $V ## Which Parameters are Fixed? $includes ## Parameter Matrix: ## U V params includes ## mu -0.29301831 0.2930183 0.02930183 TRUE ## omega 0.00000100 100.0000000 0.10000000 TRUE ## alpha1 0.00000001 1.0000000 0.10000000 TRUE ## gamma1 -0.99999999 1.0000000 0.10000000 FALSE ## beta1 0.00000001 1.0000000 0.80000000 TRUE ## delta 0.00000000 2.0000000 2.00000000 FALSE ## skew 0.10000000 10.0000000 1.00000000 FALSE ## shape 1.00000000 10.0000000 4.00000000 FALSE ## Index List of Parameters to be Optimized: ## mu omega alpha1 beta1 ## 1 2 3 5 ## Persistence: 0.9 ## ## ## --- START OF TRACE --- ## Selected Algorithm: nlminb ## ## R coded nlminb Solver: ## ## 0: 762.16113: 0.0293018 0.100000 0.100000 0.800000 ## 1: 759.57018: 0.0293031 0.0788547 0.0992305 0.785324 ## 2: 756.70937: 0.0293060 0.0786691 0.124553 0.789998 ## 3: 756.63443: 0.0293121 0.0591280 0.137720 0.779615 ## 4: 755.45163: 0.0293156 0.0694988 0.144753 0.782574 ## 5: 755.11502: 0.0293278 0.0716495 0.148522 0.770459 ## 6: 754.84923: 0.0293825 0.0771521 0.169095 0.756102 ## 7: 754.65299: 0.0297599 0.0937332 0.181315 0.710690 ## 8: 754.46118: 0.0298784 0.0926112 0.187934 0.716471 ## 9: 754.44556: 0.0299193 0.0883728 0.189186 0.716695 ## 10: 754.42417: 0.0300197 0.0885991 0.191146 0.719122 ## 11: 754.41693: 0.0301539 0.0879314 0.191044 0.718804 ## 12: 754.36809: 0.0323237 0.0912530 0.189018 0.715188 ## 13: 754.06029: 0.0449843 0.0887579 0.194768 0.717037 ## 14: 753.90002: 0.0576412 0.0917023 0.200796 0.706803 ## 15: 753.88511: 0.0604351 0.0891227 0.195525 0.713504 ## 16: 753.87586: 0.0632386 0.0896136 0.198517 0.710881 ## 17: 753.87506: 0.0643934 0.0898699 0.199565 0.709794 ## 18: 753.87506: 0.0643908 0.0898665 0.199589 0.709790 ## 19: 753.87506: 0.0643904 0.0898671 0.199589 0.709789 ## ## Final Estimate of the Negative LLH: ## LLH: -1989.999 norm LLH: -3.45486 ## mu omega alpha1 beta1 ## 5.495216e-04 6.545287e-06 1.995887e-01 7.097889e-01 ## ## R-optimhess Difference Approximated Hessian Matrix: ## mu omega alpha1 beta1 ## mu -1.307989e+07 -2.443104e+08 5204.893 -11956.368 ## omega -2.443104e+08 -1.702545e+12 -42924455.368 -76179748.158 ## alpha1 5.204893e+03 -4.292446e+07 -2516.685 -2876.809 ## beta1 -1.195637e+04 -7.617975e+07 -2876.809 -4397.969 ## attr(,"time") ## Time difference of 0.01220894 secs ## ## --- END OF TRACE --- ## ## ## Time to Estimate Parameters: ## Time difference of 0.04920602 secs ``` ``` ## Warning: Using formula(x) is deprecated when x is a character vector of length > 1. ## Consider formula(paste(x, collapse = " ")) instead. ``` ## Real data example (continue) * `fGarch` notation: $$ `\begin{align} r_{\mu, t} & = \mu + r_t \\ h_t & = \omega + \alpha_1 r_{t-1}^2 + \beta_1 h_{t-1} \end{align}` $$ ```r print(gar) ``` ``` ## ## Title: ## GARCH Modelling ## ## Call: ## garchFit(formula = ~garch(1, 1), data = d1$r, cond.dist = c("norm")) ## ## Mean and Variance Equation: ## data ~ garch(1, 1) ## <environment: 0x00000000213c2820> ## [data = d1$r] ## ## Conditional Distribution: ## norm ## ## Coefficient(s): ## mu omega alpha1 beta1 ## 5.4952e-04 6.5453e-06 1.9959e-01 7.0979e-01 ## ## Std. Errors: ## based on Hessian ## ## Error Analysis: ## Estimate Std. Error t value Pr(>|t|) ## mu 5.495e-04 2.801e-04 1.962 0.049789 * ## omega 6.545e-06 1.805e-06 3.627 0.000287 *** ## alpha1 1.996e-01 4.478e-02 4.457 8.31e-06 *** ## beta1 7.098e-01 5.374e-02 13.208 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Log Likelihood: ## 1989.999 normalized: 3.45486 ## ## Description: ## Wed Jun 08 23:17:27 2022 by user: zhent ``` ## Asymmetric effects * GARCH implies that shocks, positive and negative, have symmetric effects * Theory and empirical evidence support asymmetry * Negative shocks have a larger effect * TARCH(1,1) ("T" for *threshold*) $$ h_t = \alpha_0 + [\alpha_1 + \lambda \mathbb{I}(r_{t-1}>0)] r_{t-1}^2 + \beta_1 h_{t-1} $$ ## Volatility Forecast * Consider GARCH(1,1): $$ h_{t+1} = \alpha_0 + \alpha_1 r_t^2 + \beta_1 h_t $$ * Given information up to `\(T\)`, the one-step-ahead forecast $$ `\begin{align} h_{T+1|T} & = \alpha_0 + \alpha_1 r_T^2 + \beta_1 h_T \\ h_{T+2|T} & = \alpha_0 + \alpha_1 E_T[r_{T+1}^2] + \beta_1 h_{T+1|T} = \alpha_0 + (\alpha_1 + \beta_1) h_{T+1|T} \\ \cdots \\ h_{T+k|T} & = \alpha_0 + \alpha_1 E_T[r_{T+k-1}^2] + \beta_1 h_{T+k-1|T} = \alpha_0 + (\alpha_1 + \beta_1) h_{T+k-1|T} \end{align}` $$ * In practice, use estimated coefficients to replace `\(\alpha_0\)`, `\(\alpha_1\)` and `\(\beta_1\)` * Long-run average variance $$ \lim_{k\to \infty} h_{T+k|T} = \frac{\alpha_0}{1-\alpha_1 - \beta_1} $$ * GARCH(1,1) is difficult to beat in terms of forecast (Hansen and Lunde, 2005) ## Accumulated variance * Future `\(k\)`-period (cumulative) log return is $$ r_{T+1}(k) = r_{T+1} + r_{T+2} + \cdots + r_{T+k} $$ * Take conditional expectation on both sides: $$ E_T [r^2_{T+1}(k)] = h_{T+1|T} + h_{T+2|T} + \cdots + h_{T+k|T} $$ ## Real data example (continue) ```r predict(gar, n.ahead = 5, plot = TRUE, mse = "cond", nx = 20) ``` ![](ts_slides9_files/figure-html/unnamed-chunk-9-1.png)<!-- --> ``` ## meanForecast meanError standardDeviation lowerInterval upperInterval ## 1 0.0005495216 0.005722513 0.005722513 -0.01066640 0.01176544 ## 2 0.0005495216 0.006027007 0.006027007 -0.01126320 0.01236224 ## 3 0.0005495216 0.006291125 0.006291125 -0.01178086 0.01287990 ## 4 0.0005495216 0.006522029 0.006522029 -0.01223342 0.01333246 ## 5 0.0005495216 0.006725129 0.006725129 -0.01263149 0.01373053 ``` * Notice the predicted mean and the sd ## Forecast evaluation * Issue: `\(h_t = E_{t-1}[r_t^2]\)` cannot observed * Solution: Use `\(r_t^2\)` as a proxy * HMPY Ch.15: realized variance * Compute the forecast metrics such as RMSE, MAE, ... * In addition, motivated from the likelihood estimation, a popular choice of the loss function is called the *quasi-likelihood loss* $$ QLIKE = \log \hat{h}_t + \frac{(r_{\mu, t} - \hat{\mu}) ^2}{\hat{h}_t} $$ * Risk is the average of the loss ## Test evaluation * Test the unbiasedness of a forecast * Specify a regression form $$ r_t^2 = \delta_0 + \delta_1 \hat {h}_t + e_t $$ and test the null H0: `\(\delta_0 = 0\)` and `\(\delta_1 = 1\)` * The problem of this approach is its sensitivity to large shocks * Alternative specifications: $$ `\begin{align} | r_t | & = \delta_0 + \delta_1 \sqrt{\hat {h}_t} + e_t \\ \log r_t^2 & = \delta_0 + \delta_1 \log \hat {h}_t + e_t \end{align}` $$ ## Risk-return trade-off * Augment CAPM model to take into account the compensation for risk * GARCH-M (GARCH-in-mean) model: $$ `\begin{align} r_{it} - r_{ft} & = \mu_0 + \mu_1 h_t^{\omega} + \mu_2 (r_{mt} - r_{ft}) + u_t \\ u_t & \sim N(0,h_t) \\ h_t & = \alpha_0 + \alpha_1 u_{t-1}^2 + \beta_1 h_{t-1} \end{align}` $$ * Risk-return tradeoff is represented by `\(\mu_1 h_t^{\omega}\)`, where the usual choice of `\(\omega\)` is `\(1\)` or `\(0.5\)`. * AIC can be used to decide the choice of the value. * Test: H0 `\(\mu_1 = 0\)` ## Multivariate GARCH * Similar to the extension from AR to VAR * Consider returns of `\(N\)` assets `\(r_t = (r_{1t}, r_{2t}, \ldots, r_{Nt})'\)` $$ `\begin{align} r_t & = E_{t-1}[r_t] + u_t \\ H_t & = E_{t-1}[u_t u_t^{\prime}] \end{align}` $$ * `\(H_t\)` is an `\(N\times N\)` *time-varying conditional heteroskedastic variance-covariance matrix* * Conditional variance `\(h_{iit}\)` on the diagonal line * Conditional covariance `\(h_{ijt}\)`, `\(i\neq j\)`, on the off-diagonal elements * Examples: * Time-varying beta coefficient in CAPM * Time-varying covariance in portfolio optimization ## BEKK model * Baba, Engle, Kraft and Kroner (1990) $$ H_t = C C' + A u_{t-1} u_{t-1}' A' + B H_{t-1} B' $$ * GARCH(1,1) is BEKK's special case when `\(N=1\)` * Estimation method: * Model `\(r_t \sim N(0, H_t)\)` * Maximum likelihood * Drawback: when `\(N\)` is large, too many parameters to estimate as `\(C\)`, `\(A\)` and `\(B\)` are `\(N\times N\)` matrices ## Summary * Practice relevance of conditional variance * GARCH process * Variants: IGARCH, TARCH, ... * Estimation * Forecast * Multivariate GARCH