1- Linear Relationship between variables
2- The independent variables, x 1, x 2, .... , x k are not random
Further, two or more independent variables do not have any exact linear relation
3- The expected value of the error term is zero
4- The variance of the error term is the same for all observations
5- The error term is uncorrelated across observations
6- The error term is normally distributed
The accounting equation that total assets are equal to total liabilities plus shareholder's equity could be an exact example of perfect collinearity, if we include all 3 measures in a regression as independent variables in order to explain the business worth.
The situation where two or more independent variables are not perfectly collinear but still have high correlation with each other is termed as 'Multicollinearity'. In case of multicollinearity, we can estimate the regression but with low reliability. Furthermore, it becomes practically impossible to determine which independent variable is actually producing the effect on dependent variable.
The high degree of multicollinearity may inflate OLS standard errors for the regression coefficients. If the standard errors are inflated, t-tests on the coefficients have little power to reject the null hypothesis.
Contrary to the cases of heteroskedasticity and serial correlation, there is no irrefutable test that multicollinearity is or is not a problem. In practice, multicollinearity is often a matter of degree.
The significant F-statistic (overall regression is significant) with high R-square is a classic symptom of multicollinearity, even though the t-statistics on the estimated slope coefficients may not be individually significant.
In order to determine the source of multicollinearity, we may need to experiment with including or excluding different independent variables in a regression. There is no other direct solution to detect and correct for multicollinearity.
In our multiple regression example we regressed returns to the Fidelity Select Technology Fund (FSPTX) on returns to the S&P 500 Growth Index (SGX) and S&P 500 Value Index (SVX), in order to know whether the FSPTX behaves more like a large-cap growth fund or a large-cap value fund. We used monthly data from January 2012 through December 2016 for that return-based style analysis.
The result of that regression suggested that the returns to the FSPTX were linked to the returns to the S&P 500 Growth Index (because only the coefficient of SGX was statistically significant) and not closely associated with the returns to the S&P 500 Value Index.
Now suppose we add returns to the S&P 500 (SPX) itself to the returns to above style indices to estimate the equation.
Yt = b 0 + b 1x 1t + b 2x 2t + b 3x 3t + ∈ t
Yt = the monthly return to the FSPTX.
x 1t = the monthly return to the S&P 500 Growth Index.
x 2t = the monthly return to the S&P 500 Value Index.
x 3t = the monthly return to the S&P 500.
The S&P 500 includes the component stocks of both style indices, so we are encountering a severe problem of multicollinearity.
Create a regression model to know how much variation in the FSPTX is explained by the S&P 500, Growth and Value indices between 2012 and 2016.
Assume X Variables
Because the range is the difference between the maximum and minimum returns, it can reflect extremely large or small outcomes that may not be representative of the distribution.