Linear Regression - Least Squared Distances
Why is it that when doing linear regression on a set of data points, you find the equation of the line such that y = mx + c minimises the squared differences? To me it makes sense that you would want to min the difference, not the squared difference.
What advantage does the using squared differences have over just the difference and why is it done this way?
Re: Linear Regression - Least Squared Distances
Consider the points:
(0,1)
(1,1)
(2,1)
(3,1)
(4,1)
(5,41)
(6,1)
(7,1)
(8,1)
(9,1)
And the linear approximation y = 5.
There is nothing wrong with differences or absolute differences, rather than squared differences. It needs to fit your application, whatever it is. On the other hand, you are studying "least squares". It's a valuable tool. Learn it well. Also, it has convenient mathematical properties that make it simpler to formulate and implement.
Re: Linear Regression - Least Squared Distances
I know how to formulate the line of best fit, I know the properties and usefulness of it and I am not currently studying it. I just dont understand why minimising the squared differences gives a better line of best fit than minimising the absolute differences.
Re: Linear Regression - Least Squared Distances
Quote:
Originally Posted by
Corpsecreate
I know how to formulate the line of best fit, I know the properties and usefulness of it and I am not currently studying it. I just dont understand why minimising the squared differences gives a better line of best fit than minimising the absolute differences.
Because the sum of squared differences is differentiable, while the sum of absolute differences is not.
Re: Linear Regression - Least Squared Distances
Quote:
Originally Posted by
Corpsecreate
I know how to formulate the line of best fit, I know the properties and usefulness of it and I am not currently studying it. I just dont understand why minimising the squared differences gives a better line of best fit than minimising the absolute differences.
I do not believe you gathered my point. From the collection I provided, notice the following:
1) The least squares line is y = 0.2424x+3.9091 and is a very poor representation of the data. The sum of squared differences is very large, even though we have minimized it. It's still bad.
2) Off to your measure of what constitutes "better". The sum of differences is zero. This doesn't make it a better representation of the data.
3) Possibly the best representation of the data is the line y = 1, considering (5,41) an outlier.
My point, then, is that there are many measures of "good". There is nothing that can replace good judgment in selecting an approprioate measure. Like we already said about least squares, there may be considerations of convenience that may make some measures feel better. While we're at it, though, we may wish to point out that the advent of high-speed computing devices may make traditionally less convenient mesaures more reasonable.
Re: Linear Regression - Least Squared Distances
Quote:
Originally Posted by
Prove It
Because the sum of squared differences is differentiable, while the sum of absolute differences is not.
Is that the only reason? If it is then does that mean that the sum of absolute differences would theoretically be a better estimate?
Re: Linear Regression - Least Squared Distances
Quote:
Originally Posted by
Corpsecreate
Is that the only reason? If it is then does that mean that the sum of absolute differences would theoretically be a better estimate?
Impossible to say. Minimisation of any errors would require differentiation, so it makes sense computationally to choose a function which is differentiable. Another argument could be that when you have a decent fit, most of the errors should be small (i.e. less than 1 in magnitude), so squaring them makes them even smaller.
Re: Linear Regression - Least Squared Distances
Quote:
Originally Posted by
Corpsecreate
Is that the only reason? If it is then does that mean that the sum of absolute differences would theoretically be a better estimate?
You are somewhat failing to pay attention. There are MANY minimization techniques. You cannot define "better" in any general sense. Good judgment and how your process relates to your application are far more important than some vague concept of "better".
I do not believe there is any Holy Writ that declares the perfect measure for minimization. We may be stuck with what works, what we can deal with, and perhaps what tradition has suggested.
Since you are studying Least Squares, why not pay attention to it and see if it has more advantages than you are imagining? If you have a great idea that will replace Least Squares in all future text books, please, let's see it. There is almost always a reason why things hang around for a century or two. You may wish to consider its merits due solely to this longevity.
Re: Linear Regression - Least Squared Distances
I am paying attention, you just didnt answer my question.
I know there is many minimisation/maximision techniques and of course 'better' is subject to the specific problem you are dealing with. The only type of technique I'm asking about here is linear regression, I am not interested in non-linear/polynomial regression. I am also not saying that 'my way' is better than the conventional way of fitting a line to a set of points. What I am ASKING though is WHY is it convention to use a least squares solution rather than a line that minimises the absolute differences. I am also asking WHY a least squares solution is a more accurate line of best fit than a least absolute differences.
"Prove It" said that a least squares solution is differentiable and it is why it is used, which was the reason I thought in the first place. However, hypothetically speaking, if a method were to exist (it actually does exist) that would instead fit a line to minimise the absolute differences, would it not be a better solution? And by better I mean that the line can more accurately be used for predictions and is a more accurate fit to the data points in question.
And as I have already mentioned, I am not studying least squares. I am simply asking a simple question - why is a least squares solution more accurate as a line of best fit than a minimum absolute difference solution?
Re: Linear Regression - Least Squared Distances
Quote:
Originally Posted by
Corpsecreate
"Prove It" said that a least squares solution is differentiable and it is why it is used, which was the reason I thought in the first place. However, hypothetically speaking, if a method were to exist (it actually does exist) that would instead fit a line to minimise the absolute differences, would it not be a better solution? And by better I mean that the line can more accurately be used for predictions and is a more accurate fit to the data points in question.
Does such a method exist?
Re: Linear Regression - Least Squared Distances
I can do it with excel. I dont know how excel calculates it but it can be done.
Re: Linear Regression - Least Squared Distances
You are somewhat failing to pay attention, even to what you are saying.
"I am also asking WHY a least squares solution is a more accurate line of best fit than a least absolute differences."
You are trying to define "better" and you are not succeeding.
"would it not be a better solution"
You are trying to define "better" and you are not succeeding.
"WHY a least squares solution is a more accurate"
You are trying to define "better" and you are not succeeding.
"more accurately be used for predictions "
Forecasting is a whole different ballpark than fitting.
Let's see if I can answer your question most directly.
1) Why does Least Squares provide a better fit than the Sum of Absolute Difference? It doesn't, unless your applciation suggests that this is so. What, then, is your question? If we can clear up this misunderstanding, you've no question at all.
2) Perhaps your confusion is based on the term "Best Fit". It is normal to resist this superlative terminology. "Best Fit" doesn't actually exist by itself. The Least Squares line is the "Best Fit" ONLY if your application and judgment suggest that Least Squares is the right way to go. The term "Best Fit" must also specify the fitting techinque. There is nothing inherently connecting "Least Squares" with the term "Best Fit", although it often appears this way in text books.
Re: Linear Regression - Least Squared Distances
NB: The following is my thoughts on your question and not something i know from a textbook or other reliable source.
Quote:
Is that the only reason? If it is then does that mean that the sum of absolute differences would theoretically be a better estimate?
You need to be careful in what you think "better" means in this context. The two approaches place different emphasis on different sizes of residual. There is no "correct" answer, the best choice depends on which kinds of residual you are most bothered by.
Relative to Min Absolute Deviation (MAD :cool:), OLS places greater emphasis on large residuals (because
grows rapidly wheras |e| has a constant (absolute) slope). This means that you would expect an OLS-fitted line to be more impacted by any outlyers in the data. However this does not mean that OLS is worse, it just depends on whether you prefer "many small" residuals or "some large" ones.
An example is below. The data was deliberately chosen to show the effect and is not a real sample.
http://i52.tinypic.com/20p7969.jpg
As predicted the OLS line is closer to the outliers than the MAD line. The result is that the OLS has fewer extreme residuals, at the expense of performance elsewhere.
As far as i can tell this will be true in general where you have an outlier in your data. Although im still trying to convince myself that it remains true when residuals are small (as per plato's post).
Also, its standard practice to exclude outliers before fitting your data anyway!
Re: Linear Regression - Least Squared Distances
Isn't "Best Fit" used to describe the line that will come "closest" to intersecting the data points?
Re: Linear Regression - Least Squared Distances
were just coming back to the definition of "Best" again.
"Best fit" can be whatever you choose to call it. If i cared more about extreme residuals i would think that minimising squared residuals was a better fit than minimising absolute residuals.
If you can find a line which is closer to every point than the OLS line then i think we could all agree it was better. However im pretty sure that no such line exists (it should be trivial to prove, although i haven't actually done it).
This means that it comes down to personal preference over which residuals you want to focus on, as per post #13.