9.1 An Application

Consider an application using simulated data where two independent variables have different degrees of correlation. The simulated data was generated from the following model:

\[Y_i = 1 + 1 \;X_{1i} + 1\; X_{2i} + \varepsilon_i\]

In other words, the simulated data should return the same coefficients above if there are no problems with the estimation. The exercise will show you how collinearity can become a problem.

# 1) Regression: correlation = 0.3289
cor(MDAT$X31,MDAT$X32)
## [1] 0.3289358
CREG <- lm(Y3~X31+X32,data=MDAT)
coeftest(CREG)
## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) 1.0015735  0.0214445  46.705 < 2.2e-16 ***
## X31         1.0152385  0.0401048  25.315 < 2.2e-16 ***
## X32         0.9905016  0.0099267  99.781 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 2) Regression: correlation = 0.938
cor(MDAT$X21,MDAT$X22)
## [1] 0.9380521
CREG <- lm(Y2~X21+X22,data=MDAT)
coeftest(CREG)
## 
## t test of coefficients:
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) 1.001574   0.021445  46.705 < 2.2e-16 ***
## X21         1.100724   0.109304  10.070 < 2.2e-16 ***
## X22         0.905016   0.099267   9.117 1.082e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 3) Regression: correlation = 0.999
cor(MDAT$X11,MDAT$X12)
## [1] 0.9992777
CREG <- lm(Y1~X11+X12,data=MDAT)
coeftest(CREG)
## 
## t test of coefficients:
## 
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.001574   0.021445 46.7053  < 2e-16 ***
## X11         1.955579   0.996657  1.9621  0.05261 .  
## X12         0.050161   0.992672  0.0505  0.95980    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 4) Regression: Highest correlation = 1
cor(MDAT$X41,MDAT$X42)
## [1] 1
CREG <- lm(Y4~X41+X42,data=MDAT)
coeftest(CREG)
## 
## t test of coefficients:
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) 1.003483   0.021342  47.019 < 2.2e-16 ***
## X41         2.002616   0.037857  52.900 < 2.2e-16 ***
## X42               NA         NA      NA        NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The above application considers four sets of data where the only difference is the degree of collinearity between the two independent variables. The first regression has a degree of correlation between \(X_{1i}\) and \(X_{2i}\) equal to 0.33, and you will see that the regression does a fairly good job at recovering the regression coefficients. The second regression has a degree of correlation between \(X_{1i}\) and \(X_{2i}\) equal to 0.94, and the regression is beginning to suffer a bit because both slope estimates are now off by about 10 percent. The Third regression has a degree of correlation between \(X_{1i}\) and \(X_{2i}\) equal to just shy of perfect (1), and the regression is now way off from the expected estimates. Finally, the fourth regression has perfect collinearity between \(X_{1i}\) and \(X_{2i}\), and the regression actually chokes by providing an NA (meaning, not a number) as an answer for the second coefficient. Mathematically, perfect collinearity asks for a computer to divide a number by zero (which computers don’t like to do).