Review of Regressions

class: center, middle, inverse, title-slide

.title[
# Review of Regressions
]
.author[
### Zhentao Shi
]
.date[
### Aug 30, 2021
]

---

class: middle left
background-image: url('background/sincerely-media.jpg')
background-size: cover

## Static models

* Factor `$x_t$` causes immediate reaction from `$y_t$`
* Phillips curve: inflation <- unemployment
* Okun's law: unemployment <- GDP
* Capital aseet pricing model (CAPM):

`$$r_t - r_{ft} = \alpha + \beta MKT_t + e_t$$`

where `$MKT_t = r_{Mt} - r_{ft}$`.

---

## Simple regression

* Review Wooldridge's Ch.2.
* Conventionally, cross sectional observations are indexed by `$i$`. Time series is indexed by `$t$`. 
* *Simple regression* is OLS with `$y_t$`, `$x_t$` and an intercept, for `$t=1,\ldots,T$`:

`$$\min_{\beta_1,\beta_2} \frac{1}{2} \sum_{t=1}^T (y_t - \beta_1 - \beta_2 x_t)^2$$`

* The slope estimate

$$
\hat{\beta}_2 = \frac{ \sum_t (x_t - \bar{x})(y_t - \bar{y})} {\sum_t (x_t - \bar{x})^2}
$$

is the ratio between the sample covariance of `$(x_t, y_t)$` and the sample variance of `$x_t$`.

* The intercept estimate is `$\hat{\beta}_1 = \bar{y} - \hat{\beta}_2 \bar{x}$`.

---

## Factor models

* Multiple factor (Fama and French, 1993)

`$$r_t - r_{ft} = \alpha + \beta_1 MKT_t + \beta_2 SMB_t + \beta_3 HML_t +  e_t$$`

```r
d0 <- read.csv("fama_french.csv", header = TRUE)

reg <- lm( (r1 - rf) ~ mktrf + smb + hml, data = d0  )
print(summary(reg))
```

```
## 
## Call:
## lm(formula = (r1 - rf) ~ mktrf + smb + hml, data = d0)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -54.142  -2.532  -0.114   2.083  95.562 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.74958    0.22153  -3.384 0.000742 ***
## mktrf        1.28935    0.04372  29.489  < 2e-16 ***
## smb          1.38601    0.07204  19.240  < 2e-16 ***
## hml          0.36623    0.06399   5.723 1.36e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.097 on 1046 degrees of freedom
## Multiple R-squared:  0.6579,	Adjusted R-squared:  0.657 
## F-statistic: 670.6 on 3 and 1046 DF,  p-value: < 2.2e-16
```

---

* Factor zoo

---

## Seasonality

* Examples: retail sales, electricity usage, etc
* One easy way to deal with the seasonal effect is adding dummy variables

* Some time series are already seasonally adjusted, for example monthly GDP

---

## Event study

* Effect of big announcement in the financial market
* Pre-event window, event window and post-event window

`$$r_t - r_{ft} = \beta_1 + \beta_2 MKT_t + \beta_3 PRE_t + \beta_4 EVT_t +\beta_5 POST_t +  e_t$$`

* Windows chosen by the researcher

---

## Multivariate regression

* Extend to the case of any finite number of regressors `$K$`. (Wooldridge ch.3)
* Let `$x_t = (x_{t1},\ldots,x_{tK})'$`, and `$\beta = (\beta_1,\ldots,\beta_K)$`. The optimization of OLS is

`$$\min_{\beta} \frac{1}{2} \sum_{t=1}^T (y_t - x_t' \beta)^2$$`

* Take first-order condition. The solution is

`$$\hat{\beta} = (\sum_t x_t x_t')^{-1} \sum_t x_t y_t$$`

* Matrix notation: let `$Y$` be `$T\times 1$` matrix, and `$X$` be `$T\times K$` matrix.

`$$\hat{\beta} = (X'X)^{-1} X'Y$$`

```r
y <- d0$r1 - d0$rf
X <- cbind(1, d0$mktrf, d0$smb, d0$hml)
beta_hat <- solve( t(X) %*% X, t(X) %*% y ) 
print( as.vector( beta_hat) )
```

```
## [1] -0.7495832  1.2893488  1.3860096  0.3662339
```

---

## Joint distribution

* `$(y_t, x_t)$` follows a stable joint distribution invariant of `$t$`.
* Find the best linear combination of `$x_t' \beta$` that minimizes the prediction error

`$$E [ (y_t - x_t' \beta)^2]$$`

* The solution is the projection coefficient in the abstract model

`$$\beta_0 = ( E [x_t x_t'])^{-1} E [x_t y_t]$$`

---

## Projection error

* Define the reminder after the projection `$\epsilon_t = y_t - x_t'\beta_0$`.
* First order condition: `$E[x_t (y_t - x_t'\beta_0)] = E[x_t \epsilon_t] = 0$`.
* When an intercept is in `$x_t$`, then `$E[ \epsilon_t ] = 0$`.

---

## Discrepancy

* Review and extension of Wooldridge's Ch.5.
* `$\beta_0$` is a constant.
* `$\hat{\beta}$` is a random variable.
* The discrepancy between `$\hat{\beta}$` and `$\beta_0$` is

`$$\hat{\beta} - \beta_0  = \left( \frac{1}{T} \sum_t x_t x_t' \right)^{-1} \frac{1}{T} \sum_t x_t'\epsilon_t$$`

---

## Simulated example

* When `$n = 20$`.

```r
set.seed(2021-8-30)
n = 20 # sample size  
K = 4  # number of paramters
b0 = as.matrix( c(0.5, 2, -1, 0) ) # the true coefficient
X = cbind(1, matrix( rnorm(n * (K-1)), nrow = n ) )  # the regressor matrix 
e = rnorm(n) # the error term
Y = X %*% b0 + e # generate the dependent variable
bhat = solve(t(X) %*% X, t(X) %*% Y ) %>% as.vector() %>% print()
```

```
## [1]  0.4948475  2.3840175 -0.9477184  0.2777489
```

* Now the sample size is increased to `$n = 2000$`.

```r
n = 2000 # sample size  
X = cbind(1, matrix( rnorm(n * (K-1)), nrow = n ) )  # the regressor matrix 
e = rnorm(n) # the error term
Y = X %*% b0 + e # generate the dependent variable
bhat = solve(t(X) %*% X, t(X) %*% Y ) %>% as.vector() %>% print()
```

```
## [1]  0.486747547  2.002654737 -1.013039652 -0.008006543
```

---

## Large Sample Theory (Scalar)

* Consider a scalar random variable `$z_t$`.

* Law of large numbers

$$
\frac{1}{T} \sum_t z_t \stackrel{p}{\to } E[z_t]
$$

* Central limit theorem: If `$z_t$` independent over `$t$`

`$$\frac{1}{\sqrt{T}} \sum_t ( z_t - E[z_t]) \stackrel{d}{\to } N(0, \sigma_z^2).$$`

---

## Demonstration of LLN

```r
sample.mean = function( n, distribution ){
  # get sample mean for a given distribution
  if (distribution == "normal"){ y = rnorm( n ) } 
  else if (distribution == "t2") {y = rt(n, 2) }
  else if (distribution == "cauchy") {y = rcauchy(n) }
  return( mean(y) )
}

LLN.plot = function(distribution){
  # draw the sample mean graph
  ybar = matrix(0, length(NN), 3 )
  for (rr in 1:3){
    for ( ii in 1:length(NN)){
      n = NN[ii]; ybar[ii, rr] = sample.mean(n, distribution)
    }  
  }
  matplot(ybar, type = "l", ylab = "mean", xlab = "", 
       lwd = 1, lty = 1, main = distribution)
  abline(h = 0, lty = 2)
  return(ybar)
}
# calculation
NN = 2^(1:20); par(mfrow = c(3,1))
l1 = LLN.plot("normal"); l2 = LLN.plot("t2"); l3 = LLN.plot("cauchy")
```

![](ts_slides2_files/figure-html/unnamed-chunk-5-1.png)

---

## Demonstration of CLT

```r
Z_fun = function(n, distribution){
  if (distribution == "normal"){
      z = sqrt(n) * mean(rnorm(n))
	} else if (distribution == "chisq2") {
      df = 2; 
      x = rchisq(n,2)
      z = sqrt(n) * ( mean(x) - df ) / sqrt(2*df)
      }
  return (z)
}

CLT_plot = function(n, distribution){
  Rep = 10000
  ZZ = rep(0, Rep)
  for (i in 1:Rep) {ZZ[i] = Z_fun(n, distribution)}

xbase = seq(-4.0, 4.0, length.out = 100)
  hist( ZZ, breaks = 100, freq = FALSE, 
    xlim = c( min(xbase), max(xbase) ),
    main = paste0("hist with sample size ", n) )
  lines(x = xbase, y = dnorm(xbase), col = "red")
  return (ZZ)
}

par(mfrow = c(3,1))
phist = CLT_plot(2, "chisq2")
phist = CLT_plot(10, "chisq2")
phist = CLT_plot(100, "chisq2")
```

![](ts_slides2_files/figure-html/unnamed-chunk-6-1.png)

---

## Large Sample Theory (Vector)

* Consider a `$K$`-dimensional vector random variable `$z_t$`.

* Law of large numbers

`$$\frac{1}{T} \sum_t z_t \stackrel{p}{\to } E[z_t]$$`

* Define `$\Sigma_z = E[(z_t - E[z_t])(z_t - E[z_t])']$`.

* Central limit theorem: If `$(z_t)$` is independent over `$t$`

$$\frac{1}{\sqrt{T}} \sum_t ( z_t - E[z_t]) \stackrel{d}{\to } N(0, \Sigma_z).  $$
  
* Special case: `$E[z_t] = 0_K$`, under which `$\Sigma_z = E[z_t z_t']$`.

---

## In regressions
* Let `$z_t = x_t\epsilon_t$`. 
* By LLN:

`$$\begin{aligned}
\frac{1}{T} \sum_t x_t\epsilon_t & \stackrel{p}{\to}  0 \\
\frac{1}{T} \sum_t x_t x_t' & \stackrel{p}{\to}  E[x_t x_t'] =:Q_x
\end{aligned}$$`

* The covariance matrix `$E[ x_t\epsilon_t (x_t\epsilon_t)'] = E[x_t x_t' \epsilon_t^2]$`. 
* If `$(x_t\epsilon_t)$` are uncorrelated over `$t$`, then by CLT:

$$
\frac{1}{\sqrt{T}} \sum_t x_t\epsilon_t \stackrel{d}{\to} N(0, E[ x_t x_t' \epsilon_t^2]).
$$

---

## Discrepancy (Continue)

* The discrepancy between `$\hat{\beta}$` and `$\beta_0$` is

`$$\hat{\beta} - \beta_0  = \left( \frac{1}{T} \sum_t x_t x_t' \right)^{-1} \frac{1}{T} \sum_t x_t'\epsilon_t$$`

* By LLN: `$\hat{\beta}-\beta_0\stackrel{p}{\to}0$`. (**Consistency**)

* By CLT

`$$\begin{aligned}
\sqrt{T} (\hat{\beta} - \beta_0 ) & = \left( \frac{1}{T} \sum_t x_t x_t' \right)^{-1} \frac{1}{\sqrt{T}} \sum_t x_t'\epsilon_t \\
& \stackrel{d}{\to} N(0, Q_x^{-1} E[ x_t x_t' \epsilon_t^2] Q_x^{-1} )
\end{aligned}$$`

(**Asymptotic normality**)

---

## Classical assumptions for time series

* Wooldridge Ch.10.3

**A1.** The linear model is correctly specified.

**A2.** No perfect collinearity.

**A3.** `$E[\epsilon_t | X] = 0$`. (Strict exogeneity)

* A3 is key for **unbiasedness** `$E[ \hat{\beta} ] = \beta_0$`.

* Strict exogeneity versus contemporaneous exogeneity

---

## Classical assumptions for time series (continue)

**A4.** `$var[\epsilon_t | X ] = \sigma^2$` for all `$t=1,\ldots,T$`. (Homoskedasticity)

**A5.** `$cov[\epsilon_t, \epsilon_s | X ] = \sigma^2$` for all `$t\neq s$`. (Zero serial correlation)

Under these assumptions, `$E[ x_t x_t' \epsilon_t^2] = Q_x \sigma^2$`. The asymptotic distribution can be simplified as

`$$\sqrt{T} (\hat{\beta} - \beta_0 )  \stackrel{d}{\to} N(0, Q_x^{-1} \sigma^2 )$$`

(Special case:  in the simple regression, `$E[x_t ] = \mu_x$` and `$var[x_t] = \sigma_x^2$`. The asymptotic variance of `$\hat{\beta}$` can be written explicitly.)

* Gauss-Markov theorem: Under A1-A5, OLS is the best linear unbiased estimator (BLUE).

---

## Properties of Normal Distribution

If a `$K$`-dimensional random vector  `$x_t \sim N(0, \Sigma)$` where `$\Sigma$` is a positive definite covariance matrix, then

* For a constant matrix `$A$`, we have  `$A x_t \sim N(0, A\Sigma A')$`.
* Quadratic form: `$x_t' \Sigma^{-1} x_t \sim \chi^2(K)$`.

---

## Hypothesis Testing for Slope Coefficients

* Estimate components in the variance
  * The error variance  `$\hat{\sigma}^2 = T^{-1} \sum_t \hat{\epsilon}^2$`, where `$\epsilon_t = y_t - x_t' \hat{\beta}$` .
  * `$\hat{Q}_x = T^{-1} \sum_t x_t x_t'$` .

Take, for example:

`$$r_t - r_{ft} = \beta_1 MKT_t + \beta_2 SMB_t + \beta_3 HML_t + \beta_4 +  \epsilon_t$$`

* **Test a single coefficient**: say, `$\beta_1 = 1$`. Pre-multiply the asymptotic normality expression by `$(1,0,0,0)$`, and impose the null hypothesis.
  
  * The asymptotic distribution is 
  
`$$\sqrt{T} (\hat{\beta}_1 - 1 )  \stackrel{d}{\to} N(0, [Q_x^{-1}]_{11} \sigma^2 )$$`
  
  * The feasible `$T$`-statistic is 
  
`$$\frac{\hat{\beta}_1 - 1 }{ \sqrt{[\hat{Q}_x^{-1}]_{11} \hat{\sigma}^2 / T}   }  \stackrel{d}{\to} N(0, 1)$$`

* Software display. Noitce the following `$t$`-statistics are for the null of 0.

```r
print(summary(reg))
```

---
  
  
  
## Test a joint hypothesis

Say, `$\beta_2 = \beta_3 = 0$`.

Pre-multiply the asymptotic normality expression by

`$$A = \begin{pmatrix}
  0 & 1 & 0 & 0 \\
  0 & 0 & 1 & 0
\end{pmatrix}$$`

and impose the null hypothesis.
  
`$$\sqrt{T} A (\hat{\beta} - \beta_0 )  \stackrel{d}{\to} N(0, A Q_x^{-1} A' \sigma^2 )$$`
  
  The feasible **Wald** statistic is 
  
`$$T  (\hat{\beta} - \beta_0 )' A' (A \hat{Q}_x^{-1} A' \hat{\sigma}^2)^{-1}  A (\hat{\beta} - \beta_0 )  \stackrel{d}{\to} \chi^2(2)$$`

* More general case for a generic constant matrix `$A$`.

* If the restriction is easy to implement, then let `$RSS_0$` be the sum-of-squared-residuals of the restricted model 
and `$RSS_1$` be the RSS for the unrestricted model.
Under the null,

`$$\frac{RSS_0 - RSS_1}{RSS_1/(T-K-1)} \stackrel{d}{\to} \chi^2(r),$$`

where `$r$` is the number of restrictions.

---

## R-squared

* In-sample predicted value: `$\hat{y}_t = x_t ' \hat{\beta}$`
* R-squared: the ratio between the sample variance of `$\{\hat{y}_t\}_{t=1}^T$` and the sample variance of `$\{y_t\}_{t=1}^T$`.
* R-squared is a measure of goodness of fit.
  * In the financial markets, the R-squared for a predictive regression is usually quite low.
  * In macroeconomic models, R-squared is often non-trivial.

---

## Diagnostic analysis of the error term

* Under Assumption A4, the error terms are homoskedasticity

* Specify the regression of residuals
  
`$$\begin{align}
  \hat{\epsilon}_t^2 & = \mbox{intercept} \\
  & + \mbox{all level terms of } x_{tk} \\
  & + \mbox{all squared terms of } x_{tk} \\
  & + \mbox{all cross-term pairs of } (x_{tk},x_{tk'}) \\
  & + u_t
  \end{align}$$`
  
  Test all coefficients associated with the levels, squares and pairs are jointly 0.

* Under Assumption A5, the error terms have no autocorrelations
  * Specify the regression of residuals
  
`$$\hat{\epsilon}_t = \gamma' x_t + \theta_1 \hat{\epsilon}_{t-1} + \cdots + \theta_p \hat{\epsilon}_{t-p} + u_t$$`
  * Test `$\theta_1 = \cdots = \theta_p = 0$`. (HMPY's Eq.(3.24) is a mistake)