Zhentao Shi
Nov 15, 2021
yt=β0+β1xt+ut
A necessary condition for consistency is E[xtut]=0 (orthogonality)
If β1 is the linear projection coefficient, by definition orthogonality automatically holds
If β1 is a causal coefficient, orthogonality may be violated
If the regressor xt is correlated with ut, we say xt is endogenous
unemployment
and expected inflation
. However, expected inflation is unobservableˆβ1,OLS−β1=^cov[xt,ut]^var[xt]↛
\hat{\boldsymbol{\beta}}_{OLS} - \boldsymbol{\beta} = \left(\frac{\mathbf{X}'\mathbf{X}}{T}\right)^{-1} \frac{\mathbf{X}'\mathbf{u}}{T} \stackrel{p}{\nrightarrow} \mathbf{0}_p
\begin{align} 0 & = cov[u_t, z_t] \\ & = cov[y_t - \beta_0 - \beta_1 x_t, z_t] \\ & = cov[y_t - \beta_1 x_t, z_t] \\ & = cov[y_t, z_t] - \beta_1 cov[ x_t, z_t] \end{align}
\beta_1 = \frac{cov[y_t, z_t]} { cov[ x_t, z_t]}
if the denominator cov[ x_t, z_t]\neq 0 (relevance)
In microeconometrics, credible instruments are rare
Exclusion and relevance are at odd
Knowledge of the mechanisms
Economic structures
Time series lags
Consistent estimation can be achieved by replacing the population covariance with the sample covariance
Method of moments
\hat{\beta}_1 = \frac{\hat{cov}[y_t, z_t]} {\hat{cov}[ x_t, z_t]}
y_t = \beta_0 + \beta_1 x_{t} + u_t
x_t = \gamma_0 + \gamma_1 z_{t} + v_t
One of the most popular estimators in econometrics
In order to consistently estimate \beta_1, conduct the following two steps
\begin{align} \hat{\beta}_1^{2SLS} & = \frac{\hat{cov}[ y_t, \hat{x}_t]} {\hat{var}[ \hat{x}_t]} \\ & = \frac{\hat{cov}[ y_t, \hat{\gamma}_0 + \hat{\gamma}_1 z_{t}]} {\hat{var}[\hat{\gamma}_0 + \hat{\gamma}_1 z_{t}]}\\ & = \frac{ \hat{\gamma}_1\hat{cov}[ y_t, z_{t}]} {\hat{cov}[ \hat{\gamma}_0+ \hat{\gamma}_1 z_{t} + \hat{v}_t, \hat{\gamma}_0+\hat{\gamma}_1 z_{t}]} \\ & = \frac{ \hat{\gamma}_1\hat{cov}[ y_t, z_{t}]} { \hat{\gamma}_1 \hat{cov}[ x_t, z_{t}]} \\ & = \hat {\beta}_1 \end{align}
AER::ivreg
ivreg::ivreg
(more printed diagnostic outcomes)familyfirms.csv
d0 <- read.csv("familyfirms.csv", header = TRUE)
d99 <- d0[d0$year=="1999", ] # keep all the 1999 data. 294 firms
head(d99)
## year company agefirm meanagef assets bs_volatility founderCEO Q digit2_in
## 8 1999 1045 65 95 24374 0 0 1 45
## 16 1999 1078 99 95 14471 0 0 4 28
## 23 1999 1164 31 51 91072 0 0 2 48
## 31 1999 1209 59 95 8236 0 0 1 28
## 38 1999 1213 31 95 1643 0 0 1 45
## 45 1999 1240 41 87 15701 0 0 1 54
## OLS
ols <- lm(
log(Q) ~ founderCEO + log(assets) + log(agefirm) + bs_volatility,
data = d99 )
print(ols)
##
## Call:
## lm(formula = log(Q) ~ founderCEO + log(assets) + log(agefirm) +
## bs_volatility, data = d99)
##
## Coefficients:
## (Intercept) founderCEO log(assets) log(agefirm) bs_volatility
## -0.13446 0.27179 0.08972 -0.02961 -0.05828
## 2sls
tsls <- ivreg::ivreg(
formula = log(Q) ~ founderCEO + log(assets) + log(agefirm) + bs_volatility,
instruments = ~ meanagef + log(assets) + log(agefirm) + bs_volatility,
data = d99 )
print(tsls)
##
## Call:
## ivreg::ivreg(formula = log(Q) ~ founderCEO + log(assets) + log(agefirm) + bs_volatility | meanagef + log(assets) + log(agefirm) + bs_volatility, data = d99)
##
## Coefficients:
## (Intercept) founderCEO log(assets) log(agefirm) bs_volatility
## -0.72410 1.07827 0.10187 0.07683 -0.18784
# 1st stage
stage1 <- lm(
founderCEO ~ meanagef + log(assets) + log(agefirm) + bs_volatility,
data = d99 )
print(stage1)
##
## Call:
## lm(formula = founderCEO ~ meanagef + log(assets) + log(agefirm) +
## bs_volatility, data = d99)
##
## Coefficients:
## (Intercept) meanagef log(assets) log(agefirm) bs_volatility
## 1.182186 -0.010731 -0.008662 -0.022756 -0.018084
# 2nd stage
CEO_hat = predict(stage1) # predict the endogenous variable
stage2 <- lm(
log(Q) ~ CEO_hat + log(assets) + log(agefirm) + bs_volatility,
data = d99 )
print(summary(stage2))
##
## Call:
## lm(formula = log(Q) ~ CEO_hat + log(assets) + log(agefirm) +
## bs_volatility, data = d99)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.0233 -0.4972 -0.1004 0.3321 1.9619
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.72410 0.38467 -1.882 0.06079 .
## CEO_hat 1.07827 0.25586 4.214 3.35e-05 ***
## log(assets) 0.10187 0.03493 2.917 0.00381 **
## log(agefirm) 0.07683 0.05410 1.420 0.15665
## bs_volatility -0.18784 0.13657 -1.375 0.17007
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6086 on 289 degrees of freedom
## Multiple R-squared: 0.08088, Adjusted R-squared: 0.06816
## F-statistic: 6.358 on 4 and 289 DF, p-value: 6.468e-05
ivreg
##
## Call:
## ivreg::ivreg(formula = log(Q) ~ founderCEO + log(assets) + log(agefirm) +
## bs_volatility | meanagef + log(assets) + log(agefirm) + bs_volatility,
## data = d99)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.5630 -0.5003 -0.1148 0.4378 2.3621
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.72410 0.41803 -1.732 0.08431 .
## founderCEO 1.07827 0.27805 3.878 0.00013 ***
## log(assets) 0.10187 0.03796 2.684 0.00769 **
## log(agefirm) 0.07683 0.05879 1.307 0.19233
## bs_volatility -0.18784 0.14841 -1.266 0.20666
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 289 98.89 < 2e-16 ***
## Wu-Hausman 1 288 13.29 0.000317 ***
## Sargan 0 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6614 on 289 degrees of freedom
## Multiple R-Squared: -0.08547, Adjusted R-squared: -0.1005
## Wald test: 5.383 on 4 and 289 DF, p-value: 0.0003406
\sqrt{T} (\hat{\boldsymbol{\beta}}_{2SLS} - \boldsymbol{\beta}) \stackrel{d}{\rightarrow} N(\mathbf{0}_p, \Omega)
Endogeneity: E[x_t u_t ] \neq 0
Exogeneity: E[x_t u_t ] = 0
OLS is consistent under exogeneity, and inconsistent under endogneity
Given a valid IV, 2SLS is consistent no matter endogenous or not.
Under exogeneity, OLS is preferred as it’s “BLUE” under classical assumptions
Under endogeneity, 2SLS is preferred thanks to consistency
Null hypothesis: E[x_t u_t ] = 0 (exogeneity)
Alternative hypothesis: E[x_t u_t ] \neq 0 (endogeneity)
In the two-equation system
\begin{align} y_t & = \beta_0 + \beta_1 x_{t} + u_t \\ x_t & = \gamma_0 + \gamma_1 z_{t} + v_t \end{align}
x_t is endogeneity if and only if E[u_t, v_t] \neq 0
ivreg::ivreg
\hat{\beta}_1 = \frac{\hat{cov}[y_t, z_t]} {\hat{cov}[ x_t, z_t]} the validity of 2SLS counts on cov[ x_t, z_t] \neq 0
Weak IV, meaning cov[ x_t, z_t] \approx 0, is not uncommon in practice
Solution
ivmodel::AR.test
)##
## Call:
## ivreg::ivreg(formula = log(Q) ~ founderCEO + log(assets) + log(agefirm) +
## bs_volatility | meanagef + log(assets) + log(agefirm) + bs_volatility,
## data = d99)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.5630 -0.5003 -0.1148 0.4378 2.3621
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.72410 0.41803 -1.732 0.08431 .
## founderCEO 1.07827 0.27805 3.878 0.00013 ***
## log(assets) 0.10187 0.03796 2.684 0.00769 **
## log(agefirm) 0.07683 0.05879 1.307 0.19233
## bs_volatility -0.18784 0.14841 -1.266 0.20666
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 289 98.89 < 2e-16 ***
## Wu-Hausman 1 288 13.29 0.000317 ***
## Sargan 0 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6614 on 289 degrees of freedom
## Multiple R-Squared: -0.08547, Adjusted R-squared: -0.1005
## Wald test: 5.383 on 4 and 289 DF, p-value: 0.0003406
Space, Right Arrow or swipe left to move to next slide, click help below for more details