Linear regression, also known as linear least squares, models the straight-line relationship between the dependent variable and the independent variable.

In a simple linear regression, we may use a single independent variable to make predictions about the dependent variable. The regression line fits through the point corresponding to the means of the dependent and the independent variables. We may also test hypotheses about the relation between these two variables in addition to quantify the strength of the relationship.

We may run regression using two primary types of data: time series and cross-sectional.

Cross-sectional data engage many observations (relating to different asset classes, companies, people, countries or other entities) on X and Y for the same time period.

Whereas, time-series data involve many observations from different time periods for the same asset class, company, person, country or other entity. A mix of time-series and cross-sectional data is known as panel data.

Linear regression chooses values for coefficients **b **_{0} and **b **_{1} such that sum of the squared vertical distance between the observations and regression line is minimized.

**Equation : Y**_{i} = b _{0} + b _{1}x _{i} + ε_{i}

Where,

**Y : Dependent Variable (the variable that you are seeking to explain)**

**b **_{0} : Intercept

**b **_{1} : Slope Coefficient

**X : Independent Variable (the variable you are using to explain changes in the dependent variable)**

**ε : Error Term**

**Assumptions**

1- Linear Relationship between variables. | ExplanationThis assumption states that parameters (**b **_{0},b _{1}) are raised to the first power only and neither **b **_{0} nor **b **_{1} is multiplied or divided by another parameter. Further, that linear regression is possible as long as the regression is linear in the parameters. |

2- The independent variable X is not random. | ExplanationThis assumption is clearly often not true. For example, we frequently use return on benchmark stock index as independent variable to explain any change in a particular stock, and it is unrealistic to assume that such return are not random.
Even, if the independent variable is random, we can still rely on the regression estimates given the important assumption that the error term is not correlated with the independent variable. |

3- The expected value of the error term is zero. | |

4- The variance of the error term is the same for all observations. | Try Hypo-Test |

5- The error term is uncorrelated across observations. | Try Hypo-Test |

6- The error term is normally distributed. | Try Hypo-TestFor large samples, we may be able to drop the normality assumption by appeal to the central limit theorem, which states that the sum (as well as the mean) of a large number of independent random variables is approximately normally distributed. However, we may also apply a normality test such as Anderson-Darling Test. |

If these assumptions are violated, the estimated regression coefficients (**b^**_{0}, b^_{1}) will be biased and inconsistent.