Processing math: 7%
+ - 0:00:00
Notes for current slide
Notes for next slide

Review of Regressions

Zhentao Shi

Aug 30, 2021

1 / 25

Static models

  • Factor xt causes immediate reaction from yt
  • Phillips curve: inflation <- unemployment
  • Okun's law: unemployment <- GDP
  • Capital aseet pricing model (CAPM):

rtrft=α+βMKTt+et

where MKTt=rMtrft.

2 / 25

Simple regression

  • Review Wooldridge's Ch.2.
  • Conventionally, cross sectional observations are indexed by i. Time series is indexed by t.
  • Simple regression is OLS with yt, xt and an intercept, for t=1,,T:

min

  • The slope estimate

\hat{\beta}_2 = \frac{ \sum_t (x_t - \bar{x})(y_t - \bar{y})} {\sum_t (x_t - \bar{x})^2}

is the ratio between the sample covariance of (x_t, y_t) and the sample variance of x_t.

  • The intercept estimate is \hat{\beta}_1 = \bar{y} - \hat{\beta}_2 \bar{x}.
3 / 25

Factor models

  • Multiple factor (Fama and French, 1993)

r_t - r_{ft} = \alpha + \beta_1 MKT_t + \beta_2 SMB_t + \beta_3 HML_t + e_t

d0 <- read.csv("fama_french.csv", header = TRUE)
reg <- lm( (r1 - rf) ~ mktrf + smb + hml, data = d0 )
print(summary(reg))
##
## Call:
## lm(formula = (r1 - rf) ~ mktrf + smb + hml, data = d0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -54.142 -2.532 -0.114 2.083 95.562
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.74958 0.22153 -3.384 0.000742 ***
## mktrf 1.28935 0.04372 29.489 < 2e-16 ***
## smb 1.38601 0.07204 19.240 < 2e-16 ***
## hml 0.36623 0.06399 5.723 1.36e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.097 on 1046 degrees of freedom
## Multiple R-squared: 0.6579, Adjusted R-squared: 0.657
## F-statistic: 670.6 on 3 and 1046 DF, p-value: < 2.2e-16
4 / 25
  • Factor zoo
5 / 25

Seasonality

  • Examples: retail sales, electricity usage, etc
  • One easy way to deal with the seasonal effect is adding dummy variables

  • Some time series are already seasonally adjusted, for example monthly GDP

6 / 25

Event study

  • Effect of big announcement in the financial market
  • Pre-event window, event window and post-event window

r_t - r_{ft} = \beta_1 + \beta_2 MKT_t + \beta_3 PRE_t + \beta_4 EVT_t +\beta_5 POST_t + e_t

  • Windows chosen by the researcher
7 / 25

Multivariate regression

  • Extend to the case of any finite number of regressors K. (Wooldridge ch.3)
  • Let x_t = (x_{t1},\ldots,x_{tK})', and \beta = (\beta_1,\ldots,\beta_K). The optimization of OLS is

\min_{\beta} \frac{1}{2} \sum_{t=1}^T (y_t - x_t' \beta)^2

  • Take first-order condition. The solution is

\hat{\beta} = (\sum_t x_t x_t')^{-1} \sum_t x_t y_t

  • Matrix notation: let Y be T\times 1 matrix, and X be T\times K matrix.

\hat{\beta} = (X'X)^{-1} X'Y

y <- d0$r1 - d0$rf
X <- cbind(1, d0$mktrf, d0$smb, d0$hml)
beta_hat <- solve( t(X) %*% X, t(X) %*% y )
print( as.vector( beta_hat) )
## [1] -0.7495832 1.2893488 1.3860096 0.3662339
8 / 25

Joint distribution

  • (y_t, x_t) follows a stable joint distribution invariant of t.
  • Find the best linear combination of x_t' \beta that minimizes the prediction error

E [ (y_t - x_t' \beta)^2]

  • The solution is the projection coefficient in the abstract model

\beta_0 = ( E [x_t x_t'])^{-1} E [x_t y_t]

9 / 25

Projection error

  • Define the reminder after the projection \epsilon_t = y_t - x_t'\beta_0.
  • First order condition: E[x_t (y_t - x_t'\beta_0)] = E[x_t \epsilon_t] = 0.
  • When an intercept is in x_t, then E[ \epsilon_t ] = 0.
10 / 25

Discrepancy

  • Review and extension of Wooldridge's Ch.5.
  • \beta_0 is a constant.
  • \hat{\beta} is a random variable.
  • The discrepancy between \hat{\beta} and \beta_0 is

\hat{\beta} - \beta_0 = \left( \frac{1}{T} \sum_t x_t x_t' \right)^{-1} \frac{1}{T} \sum_t x_t'\epsilon_t

11 / 25

Simulated example

  • When n = 20.
set.seed(2021-8-30)
n = 20 # sample size
K = 4 # number of paramters
b0 = as.matrix( c(0.5, 2, -1, 0) ) # the true coefficient
X = cbind(1, matrix( rnorm(n * (K-1)), nrow = n ) ) # the regressor matrix
e = rnorm(n) # the error term
Y = X %*% b0 + e # generate the dependent variable
bhat = solve(t(X) %*% X, t(X) %*% Y ) %>% as.vector() %>% print()
## [1] 0.4948475 2.3840175 -0.9477184 0.2777489
  • Now the sample size is increased to n = 2000.
n = 2000 # sample size
X = cbind(1, matrix( rnorm(n * (K-1)), nrow = n ) ) # the regressor matrix
e = rnorm(n) # the error term
Y = X %*% b0 + e # generate the dependent variable
bhat = solve(t(X) %*% X, t(X) %*% Y ) %>% as.vector() %>% print()
## [1] 0.486747547 2.002654737 -1.013039652 -0.008006543
12 / 25

Large Sample Theory (Scalar)

  • Consider a scalar random variable z_t.

  • Law of large numbers

\frac{1}{T} \sum_t z_t \stackrel{p}{\to } E[z_t]

  • Central limit theorem: If z_t independent over t

\frac{1}{\sqrt{T}} \sum_t ( z_t - E[z_t]) \stackrel{d}{\to } N(0, \sigma_z^2).

13 / 25

Demonstration of LLN

sample.mean = function( n, distribution ){
# get sample mean for a given distribution
if (distribution == "normal"){ y = rnorm( n ) }
else if (distribution == "t2") {y = rt(n, 2) }
else if (distribution == "cauchy") {y = rcauchy(n) }
return( mean(y) )
}
LLN.plot = function(distribution){
# draw the sample mean graph
ybar = matrix(0, length(NN), 3 )
for (rr in 1:3){
for ( ii in 1:length(NN)){
n = NN[ii]; ybar[ii, rr] = sample.mean(n, distribution)
}
}
matplot(ybar, type = "l", ylab = "mean", xlab = "",
lwd = 1, lty = 1, main = distribution)
abline(h = 0, lty = 2)
return(ybar)
}
# calculation
NN = 2^(1:20); par(mfrow = c(3,1))
l1 = LLN.plot("normal"); l2 = LLN.plot("t2"); l3 = LLN.plot("cauchy")

14 / 25

Demonstration of CLT

Z_fun = function(n, distribution){
if (distribution == "normal"){
z = sqrt(n) * mean(rnorm(n))
} else if (distribution == "chisq2") {
df = 2;
x = rchisq(n,2)
z = sqrt(n) * ( mean(x) - df ) / sqrt(2*df)
}
return (z)
}
CLT_plot = function(n, distribution){
Rep = 10000
ZZ = rep(0, Rep)
for (i in 1:Rep) {ZZ[i] = Z_fun(n, distribution)}
xbase = seq(-4.0, 4.0, length.out = 100)
hist( ZZ, breaks = 100, freq = FALSE,
xlim = c( min(xbase), max(xbase) ),
main = paste0("hist with sample size ", n) )
lines(x = xbase, y = dnorm(xbase), col = "red")
return (ZZ)
}
par(mfrow = c(3,1))
phist = CLT_plot(2, "chisq2")
phist = CLT_plot(10, "chisq2")
phist = CLT_plot(100, "chisq2")

15 / 25

Large Sample Theory (Vector)

  • Consider a K-dimensional vector random variable z_t.

  • Law of large numbers

\frac{1}{T} \sum_t z_t \stackrel{p}{\to } E[z_t]

  • Define \Sigma_z = E[(z_t - E[z_t])(z_t - E[z_t])'].

  • Central limit theorem: If (z_t) is independent over t

\frac{1}{\sqrt{T}} \sum_t ( z_t - E[z_t]) \stackrel{d}{\to } N(0, \Sigma_z).

  • Special case: E[z_t] = 0_K, under which \Sigma_z = E[z_t z_t'].
16 / 25

In regressions

  • Let z_t = x_t\epsilon_t.
  • By LLN:

\begin{aligned} \frac{1}{T} \sum_t x_t\epsilon_t & \stackrel{p}{\to} 0 \\ \frac{1}{T} \sum_t x_t x_t' & \stackrel{p}{\to} E[x_t x_t'] =:Q_x \end{aligned}

  • The covariance matrix E[ x_t\epsilon_t (x_t\epsilon_t)'] = E[x_t x_t' \epsilon_t^2].
  • If (x_t\epsilon_t) are uncorrelated over t, then by CLT:

\frac{1}{\sqrt{T}} \sum_t x_t\epsilon_t \stackrel{d}{\to} N(0, E[ x_t x_t' \epsilon_t^2]).

17 / 25

Discrepancy (Continue)

  • The discrepancy between \hat{\beta} and \beta_0 is

\hat{\beta} - \beta_0 = \left( \frac{1}{T} \sum_t x_t x_t' \right)^{-1} \frac{1}{T} \sum_t x_t'\epsilon_t

  • By LLN: \hat{\beta}-\beta_0\stackrel{p}{\to}0. (Consistency)

  • By CLT

\begin{aligned} \sqrt{T} (\hat{\beta} - \beta_0 ) & = \left( \frac{1}{T} \sum_t x_t x_t' \right)^{-1} \frac{1}{\sqrt{T}} \sum_t x_t'\epsilon_t \\ & \stackrel{d}{\to} N(0, Q_x^{-1} E[ x_t x_t' \epsilon_t^2] Q_x^{-1} ) \end{aligned}

(Asymptotic normality)

18 / 25

Classical assumptions for time series

  • Wooldridge Ch.10.3

A1. The linear model is correctly specified.

A2. No perfect collinearity.

A3. E[\epsilon_t | X] = 0. (Strict exogeneity)

  • A3 is key for unbiasedness E[ \hat{\beta} ] = \beta_0.

  • Strict exogeneity versus contemporaneous exogeneity

19 / 25

Classical assumptions for time series (continue)

A4. var[\epsilon_t | X ] = \sigma^2 for all t=1,\ldots,T. (Homoskedasticity)

A5. cov[\epsilon_t, \epsilon_s | X ] = \sigma^2 for all t\neq s. (Zero serial correlation)

Under these assumptions, E[ x_t x_t' \epsilon_t^2] = Q_x \sigma^2. The asymptotic distribution can be simplified as

\sqrt{T} (\hat{\beta} - \beta_0 ) \stackrel{d}{\to} N(0, Q_x^{-1} \sigma^2 )

(Special case: in the simple regression, E[x_t ] = \mu_x and var[x_t] = \sigma_x^2. The asymptotic variance of \hat{\beta} can be written explicitly.)

  • Gauss-Markov theorem: Under A1-A5, OLS is the best linear unbiased estimator (BLUE).
20 / 25

Properties of Normal Distribution

If a K-dimensional random vector x_t \sim N(0, \Sigma) where \Sigma is a positive definite covariance matrix, then

  • For a constant matrix A, we have A x_t \sim N(0, A\Sigma A').
  • Quadratic form: x_t' \Sigma^{-1} x_t \sim \chi^2(K).
21 / 25

Hypothesis Testing for Slope Coefficients

  • Estimate components in the variance
    • The error variance \hat{\sigma}^2 = T^{-1} \sum_t \hat{\epsilon}^2, where \epsilon_t = y_t - x_t' \hat{\beta} .
    • \hat{Q}_x = T^{-1} \sum_t x_t x_t' .

Take, for example:

r_t - r_{ft} = \beta_1 MKT_t + \beta_2 SMB_t + \beta_3 HML_t + \beta_4 + \epsilon_t

  • Test a single coefficient: say, \beta_1 = 1. Pre-multiply the asymptotic normality expression by (1,0,0,0), and impose the null hypothesis.

    • The asymptotic distribution is

\sqrt{T} (\hat{\beta}_1 - 1 ) \stackrel{d}{\to} N(0, [Q_x^{-1}]_{11} \sigma^2 )

  • The feasible T-statistic is

\frac{\hat{\beta}_1 - 1 }{ \sqrt{[\hat{Q}_x^{-1}]_{11} \hat{\sigma}^2 / T} } \stackrel{d}{\to} N(0, 1)

  • Software display. Noitce the following t-statistics are for the null of 0.
print(summary(reg))
##
## Call:
## lm(formula = (r1 - rf) ~ mktrf + smb + hml, data = d0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -54.142 -2.532 -0.114 2.083 95.562
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.74958 0.22153 -3.384 0.000742 ***
## mktrf 1.28935 0.04372 29.489 < 2e-16 ***
## smb 1.38601 0.07204 19.240 < 2e-16 ***
## hml 0.36623 0.06399 5.723 1.36e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.097 on 1046 degrees of freedom
## Multiple R-squared: 0.6579, Adjusted R-squared: 0.657
## F-statistic: 670.6 on 3 and 1046 DF, p-value: < 2.2e-16
22 / 25

Test a joint hypothesis

Say, \beta_2 = \beta_3 = 0.

Pre-multiply the asymptotic normality expression by

A = \begin{pmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \end{pmatrix}

and impose the null hypothesis.

\sqrt{T} A (\hat{\beta} - \beta_0 ) \stackrel{d}{\to} N(0, A Q_x^{-1} A' \sigma^2 )

The feasible Wald statistic is

T (\hat{\beta} - \beta_0 )' A' (A \hat{Q}_x^{-1} A' \hat{\sigma}^2)^{-1} A (\hat{\beta} - \beta_0 ) \stackrel{d}{\to} \chi^2(2)

  • More general case for a generic constant matrix A.

  • If the restriction is easy to implement, then let RSS_0 be the sum-of-squared-residuals of the restricted model and RSS_1 be the RSS for the unrestricted model. Under the null,

\frac{RSS_0 - RSS_1}{RSS_1/(T-K-1)} \stackrel{d}{\to} \chi^2(r),

where r is the number of restrictions.

23 / 25

R-squared

  • In-sample predicted value: \hat{y}_t = x_t ' \hat{\beta}
  • R-squared: the ratio between the sample variance of \{\hat{y}_t\}_{t=1}^T and the sample variance of \{y_t\}_{t=1}^T.
  • R-squared is a measure of goodness of fit.
    • In the financial markets, the R-squared for a predictive regression is usually quite low.
    • In macroeconomic models, R-squared is often non-trivial.
24 / 25

Diagnostic analysis of the error term

  • Under Assumption A4, the error terms are homoskedasticity

    • Specify the regression of residuals

\begin{align} \hat{\epsilon}_t^2 & = \mbox{intercept} \\ & + \mbox{all level terms of } x_{tk} \\ & + \mbox{all squared terms of } x_{tk} \\ & + \mbox{all cross-term pairs of } (x_{tk},x_{tk'}) \\ & + u_t \end{align}

Test all coefficients associated with the levels, squares and pairs are jointly 0.

  • Under Assumption A5, the error terms have no autocorrelations
    • Specify the regression of residuals

\hat{\epsilon}_t = \gamma' x_t + \theta_1 \hat{\epsilon}_{t-1} + \cdots + \theta_p \hat{\epsilon}_{t-p} + u_t

  • Test \theta_1 = \cdots = \theta_p = 0. (HMPY's Eq.(3.24) is a mistake)
25 / 25

Static models

  • Factor x_t causes immediate reaction from y_t
  • Phillips curve: inflation <- unemployment
  • Okun's law: unemployment <- GDP
  • Capital aseet pricing model (CAPM):

r_t - r_{ft} = \alpha + \beta MKT_t + e_t

where MKT_t = r_{Mt} - r_{ft}.

2 / 25
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow