Regression Analysis – linear regression, SSE , Assumption of linear regression Error term and best Fit line

Posted on Updated on

cartoon_guide_regression

Regression Analysis :-In statistics, regression analysis is a statistical process for estimating the relationships among variables. herein this post we discuss only about linear regression.

Regression analysis is used to:

  • Predict the value of a dependent variable based on the value of at least one independent variable
  • Explain the impact of changes in an independent variable on the dependent variable

Dependent variable: the variable we wish to explain (also called the endogenous variable)

Independent variable: the variable used to explain (also called the exogenous variable)

  • The relationship between X and Y is described by a linear function
  • Changes in Y are assumed to be caused by changes in X

Linear regression population equation model: Picture1

Now let’s have look on above equation

Y = Dependent variable

X = independent Variable

β0 and β1 = Where β0 and β1 are the population model coefficients and

ε is a random error term , so let’s understand error term in broad manner coz it impact a lot when we try to learn best fit line or calculating SSE  –

Here error is basically the distance between predicted point (on true regression line) and observed point is an error. It is also called disturbance or in more simpler way it is the vertical distance (downward or upward) that any data point is away from the ‘best fit line’. That is the point lies a certain distance either above or below the line and if you were to draw a line from it to the best fit line, then that distance would be considered the ‘error’

the above equation is similar to staringht line equation, and it form two intercepts with error term, lets have a look on the below image to get the better sight of it:-

regressioncurvthe above equation is equation represent population regression model while the simple linear regression model provides the estimates of population regression model that can be written like this ;-

Picture2

Where
Yi = Estimated Y value of observation ith
Xi = value of X observation ith
b0 and b1 = are the estimate of regression intercept and estimate of regression slope

Assumptions for linear regression :-

  • X is non Random i.e no variance
  • error term is random which also makes Y random . Hence for the smae value of X two different observations may have different value of Y i.e cause of error term.
  • error has mean value = 0 and standard deviation is independent of X

Fitting the regression Equation ( least square estimates ) or finding best fitted line :-

_bm40

getting the estimates of β0 and β1  means finding the best straight line that can be drawn through the scatter plot of X vs Y.

In simple word we can say that least square method is t find the estimate of β0 and β1sse_constantSSE is basically sum of squares of the errors and minimum value of SSE gives the best fitted line let’s understand the term SSE in more simpler manner it is the sum of all such error points when squared and summed and finally minimized (square, sum and the minimization function being mathematical terms) gives us the equation for the line of best fit.

Hence in laymen language, regression analysis helps in future modelling and predictive analysis such that what would happen to the value of y if x were to up or down by some value.

gm_regression

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s