Equation of a straight line
All linear regression tutorials begin with the equation of a straight lineline equation
which is sometimes written asline equation variation 1
orline equation variation 2
No matter how you write it the variables are the sameline equation parameters
Linear regression
Linear regression is a concept where a line can be fit through a cloud of points. These point clouds are sometimes referred to as a scatterplot. When a line is passed through a scatterplot the points might not fall exactly on to the line, so we introduce the concept of ‘error’. The residual error is the vertical distance between each point and the line that passes through the scatterplot. Points that are close to the line have an error near zero, points above the line have an error that it positive and points below the line have an error that is negative.residual error scatterplot
This concept of error can be introduced to the line equation so that given any x value the location of any point can be found in respect to the line. This gives us the linear regression equation.linear regression equation
With linear regression the intercept and slope are wearing hats and are numbered but this is not any different than the traditional line equation.linear regression parameters
The hat just means it is an estimated parameter.
The best lines fit through a scatterplot in a way that it has the smallest error possible. This means the line should be as close as possible to the majority of the points. This can be solved by making the error as small as possible. The total error can be calculated by adding of the vertical distances of all the points to the line. Squaring the individual distances ensures the negative distances don’t cancel out the positive distances. This is called the Sum of Square Error (SSE).sum of square error definition
Error minimisation
By rearranging the linear regression equation so that the error is on its own, we can minimise the squared error to find a line that fits.error minimisation
Rules of differentiation
There are some calculus rules you need to know before finding the derivatives. The chain, power, sum and difference rules. It is assumed you know these before moving on.calculus rules
Derivation of ordinary least squares
In order the minimise the sum of the squared error, you must first calculate it’s partial derivitive with respect to its intercept and slope. To do this we use the power rule and the chain rule.partial derivitive
Setting the derivitives to zero
Univariate optimization involves taking the derivative and setting equal to 0. This minimization problem is solved by setting the partial derivatives equal to 0. That is, take the derivative of the first equation with respect to the intercept and set it equal to 0. We then do the same thing for the slope.zeroed partial derivitives
Solving for the intercept
The value for the intercept can be found by following a few rules. To do this you need to know the sum & difference rule: the derivative of a sum is the same as the sum of the derivatives.intercept solution
Solving for the slope
The solution for the intercept can be subsitituted in place. This takes the intercept out of the equation.slope solution part 1
We now have a rudimentary slope, but it doesn’t match the equations in most textbooks.
Slope numerator refinement
We can easily derive the numerator of the equation to match what you would find in a textbook.slope numerator
Slope denominator refinement
We can easily derive the denominator of the equation to match what you would find in a textbook.slope denominator
Ordinary least squares equations
The slope and y-intercept are now derived. The line of best fit can now easily be calculated.ordinary least square regression equations