class: center, middle, inverse, title-slide .title[ # Review of Regressions ] .author[ ### Zhentao Shi ] .date[ ### Aug 30, 2021 ] --- class: middle left background-image: url('background/sincerely-media.jpg') background-size: cover ## Static models * Factor `\(x_t\)` causes immediate reaction from `\(y_t\)` * Phillips curve: inflation <- unemployment * Okun's law: unemployment <- GDP * Capital aseet pricing model (CAPM): `$$r_t - r_{ft} = \alpha + \beta MKT_t + e_t$$` where `\(MKT_t = r_{Mt} - r_{ft}\)`. --- ## Simple regression * Review Wooldridge's Ch.2. * Conventionally, cross sectional observations are indexed by `\(i\)`. Time series is indexed by `\(t\)`. * *Simple regression* is OLS with `\(y_t\)`, `\(x_t\)` and an intercept, for `\(t=1,\ldots,T\)`: `$$\min_{\beta_1,\beta_2} \frac{1}{2} \sum_{t=1}^T (y_t - \beta_1 - \beta_2 x_t)^2$$` * The slope estimate $$ \hat{\beta}_2 = \frac{ \sum_t (x_t - \bar{x})(y_t - \bar{y})} {\sum_t (x_t - \bar{x})^2} $$ is the ratio between the sample covariance of `\((x_t, y_t)\)` and the sample variance of `\(x_t\)`. * The intercept estimate is `\(\hat{\beta}_1 = \bar{y} - \hat{\beta}_2 \bar{x}\)`. --- ## Factor models * Multiple factor (Fama and French, 1993) `$$r_t - r_{ft} = \alpha + \beta_1 MKT_t + \beta_2 SMB_t + \beta_3 HML_t + e_t$$` ```r d0 <- read.csv("fama_french.csv", header = TRUE) reg <- lm( (r1 - rf) ~ mktrf + smb + hml, data = d0 ) print(summary(reg)) ``` ``` ## ## Call: ## lm(formula = (r1 - rf) ~ mktrf + smb + hml, data = d0) ## ## Residuals: ## Min 1Q Median 3Q Max ## -54.142 -2.532 -0.114 2.083 95.562 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.74958 0.22153 -3.384 0.000742 *** ## mktrf 1.28935 0.04372 29.489 < 2e-16 *** ## smb 1.38601 0.07204 19.240 < 2e-16 *** ## hml 0.36623 0.06399 5.723 1.36e-08 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 7.097 on 1046 degrees of freedom ## Multiple R-squared: 0.6579, Adjusted R-squared: 0.657 ## F-statistic: 670.6 on 3 and 1046 DF, p-value: < 2.2e-16 ``` --- * Factor zoo --- ## Seasonality * Examples: retail sales, electricity usage, etc * One easy way to deal with the seasonal effect is adding dummy variables * Some time series are already seasonally adjusted, for example monthly GDP --- ## Event study * Effect of big announcement in the financial market * Pre-event window, event window and post-event window `$$r_t - r_{ft} = \beta_1 + \beta_2 MKT_t + \beta_3 PRE_t + \beta_4 EVT_t +\beta_5 POST_t + e_t$$` * Windows chosen by the researcher --- ## Multivariate regression * Extend to the case of any finite number of regressors `\(K\)`. (Wooldridge ch.3) * Let `\(x_t = (x_{t1},\ldots,x_{tK})'\)`, and `\(\beta = (\beta_1,\ldots,\beta_K)\)`. The optimization of OLS is `$$\min_{\beta} \frac{1}{2} \sum_{t=1}^T (y_t - x_t' \beta)^2$$` * Take first-order condition. The solution is `$$\hat{\beta} = (\sum_t x_t x_t')^{-1} \sum_t x_t y_t$$` * Matrix notation: let `\(Y\)` be `\(T\times 1\)` matrix, and `\(X\)` be `\(T\times K\)` matrix. `$$\hat{\beta} = (X'X)^{-1} X'Y$$` ```r y <- d0$r1 - d0$rf X <- cbind(1, d0$mktrf, d0$smb, d0$hml) beta_hat <- solve( t(X) %*% X, t(X) %*% y ) print( as.vector( beta_hat) ) ``` ``` ## [1] -0.7495832 1.2893488 1.3860096 0.3662339 ``` --- ## Joint distribution * `\((y_t, x_t)\)` follows a stable joint distribution invariant of `\(t\)`. * Find the best linear combination of `\(x_t' \beta\)` that minimizes the prediction error `$$E [ (y_t - x_t' \beta)^2]$$` * The solution is the projection coefficient in the abstract model `$$\beta_0 = ( E [x_t x_t'])^{-1} E [x_t y_t]$$` --- ## Projection error * Define the reminder after the projection `\(\epsilon_t = y_t - x_t'\beta_0\)`. * First order condition: `\(E[x_t (y_t - x_t'\beta_0)] = E[x_t \epsilon_t] = 0\)`. * When an intercept is in `\(x_t\)`, then `\(E[ \epsilon_t ] = 0\)`. --- ## Discrepancy * Review and extension of Wooldridge's Ch.5. * `\(\beta_0\)` is a constant. * `\(\hat{\beta}\)` is a random variable. * The discrepancy between `\(\hat{\beta}\)` and `\(\beta_0\)` is `$$\hat{\beta} - \beta_0 = \left( \frac{1}{T} \sum_t x_t x_t' \right)^{-1} \frac{1}{T} \sum_t x_t'\epsilon_t$$` --- ## Simulated example * When `\(n = 20\)`. ```r set.seed(2021-8-30) n = 20 # sample size K = 4 # number of paramters b0 = as.matrix( c(0.5, 2, -1, 0) ) # the true coefficient X = cbind(1, matrix( rnorm(n * (K-1)), nrow = n ) ) # the regressor matrix e = rnorm(n) # the error term Y = X %*% b0 + e # generate the dependent variable bhat = solve(t(X) %*% X, t(X) %*% Y ) %>% as.vector() %>% print() ``` ``` ## [1] 0.4948475 2.3840175 -0.9477184 0.2777489 ``` * Now the sample size is increased to `\(n = 2000\)`. ```r n = 2000 # sample size X = cbind(1, matrix( rnorm(n * (K-1)), nrow = n ) ) # the regressor matrix e = rnorm(n) # the error term Y = X %*% b0 + e # generate the dependent variable bhat = solve(t(X) %*% X, t(X) %*% Y ) %>% as.vector() %>% print() ``` ``` ## [1] 0.486747547 2.002654737 -1.013039652 -0.008006543 ``` --- ## Large Sample Theory (Scalar) * Consider a scalar random variable `\(z_t\)`. * Law of large numbers $$ \frac{1}{T} \sum_t z_t \stackrel{p}{\to } E[z_t] $$ * Central limit theorem: If `\(z_t\)` independent over `\(t\)` `$$\frac{1}{\sqrt{T}} \sum_t ( z_t - E[z_t]) \stackrel{d}{\to } N(0, \sigma_z^2).$$` --- ## Demonstration of LLN ```r sample.mean = function( n, distribution ){ # get sample mean for a given distribution if (distribution == "normal"){ y = rnorm( n ) } else if (distribution == "t2") {y = rt(n, 2) } else if (distribution == "cauchy") {y = rcauchy(n) } return( mean(y) ) } LLN.plot = function(distribution){ # draw the sample mean graph ybar = matrix(0, length(NN), 3 ) for (rr in 1:3){ for ( ii in 1:length(NN)){ n = NN[ii]; ybar[ii, rr] = sample.mean(n, distribution) } } matplot(ybar, type = "l", ylab = "mean", xlab = "", lwd = 1, lty = 1, main = distribution) abline(h = 0, lty = 2) return(ybar) } # calculation NN = 2^(1:20); par(mfrow = c(3,1)) l1 = LLN.plot("normal"); l2 = LLN.plot("t2"); l3 = LLN.plot("cauchy") ``` ![](ts_slides2_files/figure-html/unnamed-chunk-5-1.png)<!-- --> --- ## Demonstration of CLT ```r Z_fun = function(n, distribution){ if (distribution == "normal"){ z = sqrt(n) * mean(rnorm(n)) } else if (distribution == "chisq2") { df = 2; x = rchisq(n,2) z = sqrt(n) * ( mean(x) - df ) / sqrt(2*df) } return (z) } CLT_plot = function(n, distribution){ Rep = 10000 ZZ = rep(0, Rep) for (i in 1:Rep) {ZZ[i] = Z_fun(n, distribution)} xbase = seq(-4.0, 4.0, length.out = 100) hist( ZZ, breaks = 100, freq = FALSE, xlim = c( min(xbase), max(xbase) ), main = paste0("hist with sample size ", n) ) lines(x = xbase, y = dnorm(xbase), col = "red") return (ZZ) } par(mfrow = c(3,1)) phist = CLT_plot(2, "chisq2") phist = CLT_plot(10, "chisq2") phist = CLT_plot(100, "chisq2") ``` ![](ts_slides2_files/figure-html/unnamed-chunk-6-1.png)<!-- --> --- ## Large Sample Theory (Vector) * Consider a `\(K\)`-dimensional vector random variable `\(z_t\)`. * Law of large numbers `$$\frac{1}{T} \sum_t z_t \stackrel{p}{\to } E[z_t]$$` * Define `\(\Sigma_z = E[(z_t - E[z_t])(z_t - E[z_t])']\)`. * Central limit theorem: If `\((z_t)\)` is independent over `\(t\)` $$\frac{1}{\sqrt{T}} \sum_t ( z_t - E[z_t]) \stackrel{d}{\to } N(0, \Sigma_z). $$ * Special case: `\(E[z_t] = 0_K\)`, under which `\(\Sigma_z = E[z_t z_t']\)`. --- ## In regressions * Let `\(z_t = x_t\epsilon_t\)`. * By LLN: `$$\begin{aligned} \frac{1}{T} \sum_t x_t\epsilon_t & \stackrel{p}{\to} 0 \\ \frac{1}{T} \sum_t x_t x_t' & \stackrel{p}{\to} E[x_t x_t'] =:Q_x \end{aligned}$$` * The covariance matrix `\(E[ x_t\epsilon_t (x_t\epsilon_t)'] = E[x_t x_t' \epsilon_t^2]\)`. * If `\((x_t\epsilon_t)\)` are uncorrelated over `\(t\)`, then by CLT: $$ \frac{1}{\sqrt{T}} \sum_t x_t\epsilon_t \stackrel{d}{\to} N(0, E[ x_t x_t' \epsilon_t^2]). $$ --- ## Discrepancy (Continue) * The discrepancy between `\(\hat{\beta}\)` and `\(\beta_0\)` is `$$\hat{\beta} - \beta_0 = \left( \frac{1}{T} \sum_t x_t x_t' \right)^{-1} \frac{1}{T} \sum_t x_t'\epsilon_t$$` * By LLN: `\(\hat{\beta}-\beta_0\stackrel{p}{\to}0\)`. (**Consistency**) * By CLT `$$\begin{aligned} \sqrt{T} (\hat{\beta} - \beta_0 ) & = \left( \frac{1}{T} \sum_t x_t x_t' \right)^{-1} \frac{1}{\sqrt{T}} \sum_t x_t'\epsilon_t \\ & \stackrel{d}{\to} N(0, Q_x^{-1} E[ x_t x_t' \epsilon_t^2] Q_x^{-1} ) \end{aligned}$$` (**Asymptotic normality**) --- ## Classical assumptions for time series * Wooldridge Ch.10.3 **A1.** The linear model is correctly specified. **A2.** No perfect collinearity. **A3.** `\(E[\epsilon_t | X] = 0\)`. (Strict exogeneity) * A3 is key for **unbiasedness** `\(E[ \hat{\beta} ] = \beta_0\)`. * Strict exogeneity versus contemporaneous exogeneity --- ## Classical assumptions for time series (continue) **A4.** `\(var[\epsilon_t | X ] = \sigma^2\)` for all `\(t=1,\ldots,T\)`. (Homoskedasticity) **A5.** `\(cov[\epsilon_t, \epsilon_s | X ] = \sigma^2\)` for all `\(t\neq s\)`. (Zero serial correlation) Under these assumptions, `\(E[ x_t x_t' \epsilon_t^2] = Q_x \sigma^2\)`. The asymptotic distribution can be simplified as `$$\sqrt{T} (\hat{\beta} - \beta_0 ) \stackrel{d}{\to} N(0, Q_x^{-1} \sigma^2 )$$` (Special case: in the simple regression, `\(E[x_t ] = \mu_x\)` and `\(var[x_t] = \sigma_x^2\)`. The asymptotic variance of `\(\hat{\beta}\)` can be written explicitly.) * Gauss-Markov theorem: Under A1-A5, OLS is the best linear unbiased estimator (BLUE). --- ## Properties of Normal Distribution If a `\(K\)`-dimensional random vector `\(x_t \sim N(0, \Sigma)\)` where `\(\Sigma\)` is a positive definite covariance matrix, then * For a constant matrix `\(A\)`, we have `\(A x_t \sim N(0, A\Sigma A')\)`. * Quadratic form: `\(x_t' \Sigma^{-1} x_t \sim \chi^2(K)\)`. --- ## Hypothesis Testing for Slope Coefficients * Estimate components in the variance * The error variance `\(\hat{\sigma}^2 = T^{-1} \sum_t \hat{\epsilon}^2\)`, where `\(\epsilon_t = y_t - x_t' \hat{\beta}\)` . * `\(\hat{Q}_x = T^{-1} \sum_t x_t x_t'\)` . Take, for example: `$$r_t - r_{ft} = \beta_1 MKT_t + \beta_2 SMB_t + \beta_3 HML_t + \beta_4 + \epsilon_t$$` * **Test a single coefficient**: say, `\(\beta_1 = 1\)`. Pre-multiply the asymptotic normality expression by `\((1,0,0,0)\)`, and impose the null hypothesis. * The asymptotic distribution is `$$\sqrt{T} (\hat{\beta}_1 - 1 ) \stackrel{d}{\to} N(0, [Q_x^{-1}]_{11} \sigma^2 )$$` * The feasible `\(T\)`-statistic is `$$\frac{\hat{\beta}_1 - 1 }{ \sqrt{[\hat{Q}_x^{-1}]_{11} \hat{\sigma}^2 / T} } \stackrel{d}{\to} N(0, 1)$$` * Software display. Noitce the following `\(t\)`-statistics are for the null of 0. ```r print(summary(reg)) ``` ``` ## ## Call: ## lm(formula = (r1 - rf) ~ mktrf + smb + hml, data = d0) ## ## Residuals: ## Min 1Q Median 3Q Max ## -54.142 -2.532 -0.114 2.083 95.562 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.74958 0.22153 -3.384 0.000742 *** ## mktrf 1.28935 0.04372 29.489 < 2e-16 *** ## smb 1.38601 0.07204 19.240 < 2e-16 *** ## hml 0.36623 0.06399 5.723 1.36e-08 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 7.097 on 1046 degrees of freedom ## Multiple R-squared: 0.6579, Adjusted R-squared: 0.657 ## F-statistic: 670.6 on 3 and 1046 DF, p-value: < 2.2e-16 ``` --- ## Test a joint hypothesis Say, `\(\beta_2 = \beta_3 = 0\)`. Pre-multiply the asymptotic normality expression by `$$A = \begin{pmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \end{pmatrix}$$` and impose the null hypothesis. `$$\sqrt{T} A (\hat{\beta} - \beta_0 ) \stackrel{d}{\to} N(0, A Q_x^{-1} A' \sigma^2 )$$` The feasible **Wald** statistic is `$$T (\hat{\beta} - \beta_0 )' A' (A \hat{Q}_x^{-1} A' \hat{\sigma}^2)^{-1} A (\hat{\beta} - \beta_0 ) \stackrel{d}{\to} \chi^2(2)$$` * More general case for a generic constant matrix `\(A\)`. * If the restriction is easy to implement, then let `\(RSS_0\)` be the sum-of-squared-residuals of the restricted model and `\(RSS_1\)` be the RSS for the unrestricted model. Under the null, `$$\frac{RSS_0 - RSS_1}{RSS_1/(T-K-1)} \stackrel{d}{\to} \chi^2(r),$$` where `\(r\)` is the number of restrictions. --- ## R-squared * In-sample predicted value: `\(\hat{y}_t = x_t ' \hat{\beta}\)` * R-squared: the ratio between the sample variance of `\(\{\hat{y}_t\}_{t=1}^T\)` and the sample variance of `\(\{y_t\}_{t=1}^T\)`. * R-squared is a measure of goodness of fit. * In the financial markets, the R-squared for a predictive regression is usually quite low. * In macroeconomic models, R-squared is often non-trivial. --- ## Diagnostic analysis of the error term * Under Assumption A4, the error terms are homoskedasticity * Specify the regression of residuals `$$\begin{align} \hat{\epsilon}_t^2 & = \mbox{intercept} \\ & + \mbox{all level terms of } x_{tk} \\ & + \mbox{all squared terms of } x_{tk} \\ & + \mbox{all cross-term pairs of } (x_{tk},x_{tk'}) \\ & + u_t \end{align}$$` Test all coefficients associated with the levels, squares and pairs are jointly 0. * Under Assumption A5, the error terms have no autocorrelations * Specify the regression of residuals `$$\hat{\epsilon}_t = \gamma' x_t + \theta_1 \hat{\epsilon}_{t-1} + \cdots + \theta_p \hat{\epsilon}_{t-p} + u_t$$` * Test `\(\theta_1 = \cdots = \theta_p = 0\)`. (HMPY's Eq.(3.24) is a mistake)