Hi,

I've been learning about Polynomial regression from Internet resources. I came across this video: Polynomial Regression Model: Derivation: Part 1 of 2 - YouTube

which derives the matrix everyone uses. I think I understand most it but I have a couple of questions.

Overview of derivationLink to a shorter derivation at the bottom or the overview.

If you can find $\displaystyle a_0, a_1$ and $\displaystyle a_2$ such that:

$\displaystyle S_y=\sum^{n}_{i=1}(y_i-(a_0+a_1x_i+a_2x^2_i))^2$

is minimised. You will have found the polynomial equation of the form:

$\displaystyle y=a_0+a_1x+a_2x^2$

with the least squared error.

To do this the partial derivative is taken for $\displaystyle a_0, a_1$ and $\displaystyle a_2$ which gives:

$\displaystyle \frac{\partial S_y}{\partial a_0} =\sum^{n}_{i=1}2(y_i-a_0-a_1x_i-a_2x^2_i)(-1)$

$\displaystyle \frac{\partial S_y}{\partial a_1} =\sum^{n}_{i=1}2(y_i-a_0-a_1x_i-a_2x^2_i)(-x_i)$

$\displaystyle \frac{\partial S_y}{\partial a_2} =\sum^{n}_{i=1}2(y_i-a_0-a_1x_i-a_2x^2_i)(-x^2_i)$

Setting the derivatives to 0 minimises $\displaystyle S_y$.

$\displaystyle \frac{\partial S_y}{\partial a_0} = 0;\frac{\partial S_y}{\partial a_1} = 0;\frac{\partial S_y}{\partial a_2} = 0;$

After expanding and simplifying you're left with:

$\displaystyle a_0\sum^{n}_{i=1}n + a_1\sum^{n}_{i=1}x_i + a_2\sum^{n}_{i=1}x^2_i = \sum^{n}_{i=1}y_i$

$\displaystyle a_0\sum^{n}_{i=1}x_i + a_1\sum^{n}_{i=1}x^2_i + a_2\sum^{n}_{i=1}x^3_i = \sum^{n}_{i=1}x_iy_i$

$\displaystyle a_0\sum^{n}_{i=1}x^2_i + a_1\sum^{n}_{i=1}x^3_i + a_2\sum^{n}_{i=1}x^4_i = \sum^{n}_{i=1}x^2_iy_i$

Which leaves some linear simultaneous equations which can solved.

Shorter alternate explanation:

Least-Squares Parabola

Questions

Q1

When the partial derivative is taken, the square used to find the least square error is changed to a multiply by two, which is then cancelled out later. As far as I can tell you would get the same answer trying to find the least cubed error or the least square root error or any (non zero) error. Does this mean that the resulting function would be the same for any (non zero) powered error?

If it is the case I can't quite wrap my head around how that would work. I would have thought the higher the power the more outliers would pull the function towards themselves.

Q2

It's never explained in the video, but why does setting the partial derivatives to 0 minimise $\displaystyle S_y$?

I think I've got this one.

$\displaystyle \frac{\partial^2 S_y}{\partial a_0^2} = n$

$\displaystyle \frac{\partial^2 S_y}{\partial a_1^2} = \sum^{n}_{i=1}x^2_i$

$\displaystyle \frac{\partial^2 S_y}{\partial a_2^2} = \sum^{n}_{i=1}x^4_i$

All three must always be greater or equal to 0. For it to be equal to 0 the entire data set needs to be populated by 0s or have no data. In either case polynomial regression probably wouldn't be used. Thus it is always at a minima.

Thanks in advance,

Matthew