Chapter 9 Collinearity

We showed in the previous chapter that one wants to include correlated variables into the same regression in order for us to control for particular sample characteristics. For example, the size of a house and the number of bedrooms in the house are correlated, so including them both in the same regression that focuses on explaining house price was a good thing.

The down side of the previous argument can arise in regression analyses when two or more independent variables are too correlated with each other. This is known as collinearity (or multicollinearity). A correlation means that two or more variables systematically move together. In regression analysis, movement is information that we use to explain differences or changes in the dependent variable. If independent variables have the exact same movements due to large correlations, then they contain similar (i.e., redundant) information.

Another issue with collinearity is that when two or more variables systematically move together, then it goes against the very interpretation of our estimates: holding all else constant. If the variables aren’t held constant in the data due to collinearity (i.e., they are always moving systematically with each other), then our estimates cannot differentiate the impact of these variables along separate dimensions. Since the information from these independent variables are shared and redundant, then the dimensions from these collinear variables becomes blurred.