Why do we use:
(variance)=(Σ(the difference of each value from the mean)^2)/(number of values)

and do not use instead the:
(Σ(the absolute value of the difference of each value from the mean))/(number of values)?

I guess the because has to do with why the least squares method is the best? Why is the least squares method the best?

I guess the answer to both questions has to do with that the normal distribution has a standard deviation (and thus also variance) equal to 1. So?

Please answer as lucidly as possible (consider I have a low IQ). And explain every mathematical symbolism you use. I'd prefer if you explain it with words alone instead of math symbols.

2. First, least-squares methods depends on variances and covariances, and are "the best" because least-squares ensures that the covariance is the smallest.

Most people only say they square each deviation to make all the deviations positive, to avoid any information being lost due to cancelling. The other reason, which is why we don't use absolute values, is because when deviations are small (and they should be), squared deviations become even smaller.

3. Fisher wanted the absolute values.
Eventually Fisher lost out on the |.|

4. Yeah, as Matheagle said, I think it mainly has to do with the fact that $x^2$ is differentiable, whereas $|x|$ is not. The work has been done though using $|\cdot|$; in the case of regression, see Least absolute deviations - Wikipedia, the free encyclopedia.

5. I'd bet that it has to do with that the normal distribution has a variance and a standard deviation equal to 1.
Least squares is better than the absolute values, in order to locate the most probably correct curve, YES OR NO? And if it is better, why?

Am I alone to get it all connected? Someone must know in here.

6. Originally Posted by ThodorisK
I'd bet that it has to do with that the normal distribution has a variance and a standard deviation equal to 1.
Least squares is better than the absolute values, in order to locate the most probably correct curve, YES OR NO? And if it is better, why?

Am I alone to get it all connected? Someone must know in here.
Matheagle and I have already stated why you square the differences instead of take the absolute values: the function $f(x) = x^2$ is nicer analytically than $g(x) = |x|$. In particular, $f$ is differentiable. If you don't see why this would be advantageous, then you would probably have to have a better background in mathematical statistics (or just mathematics in general) to understand a satisfactory answer to this question to begin with. The question is inherently mathematical. If you aren't sharp at math, there is no intuitive answer. It has nothing to do with the normal distribution in particular.

Least Squares Estimators are optimal in the following way: under the usual assumptions (normal distribution, independent identically distributed errors) the estimators of the regression coefficients are UMVUE's. If you drop the distributional assumption, they are BLUE's. The criterion we use to evaluate the "goodness" of unbiased estimators is the variance though, which is the issue you are asking about in the first place.

7. Is it impossible to construct the equation of a curve (as the least squares construct it), using the average absolute deviation (or many absolute deviations)? Yes or no?

If it is possible, then it is a worse approximation of the real one, than the line or the curve constructed by the least squares? Yes or no?

8. 1) Yes, it is possible. In fact, I linked to the Wiki Page on this topic in my first post of this thread.

2) It depends on your definition of "worse".....

9. Originally Posted by ThodorisK
Is it impossible to construct the equation of a curve (as the least squares construct it), using the average absolute deviation (or many absolute deviations)? Yes or no?

If it is possible, then it is a worse approximation of the real one, than the line or the curve constructed by the least squares? Yes or no?
The Gauss-Markov Theorem proves that the Least Squares Approximator is the best approximator, because the covariance matrix of a Least Squares approximation is always the smallest possible covariance matrix. So yes, most likely, your approximation will be worse.

I would suggest you read "MULTIPLE VARIABLE REGRESSION", an article I wrote for the Issue 1 of the Math Help Forum e-zine.

http://www.mathhelpforum.com/math-he...issue-1-a.html

10. Originally Posted by theodds
It depends on your definition of "worse".....
By "worse" I mean: The constructed curve which is the worse approximation, has the worse-greater average absolute deviation from the real curve.

11. The Gauss-Markov Theorem says that the least squares is the best approximation because the variance is minimised? But the best approximation is when the average absolute deviation of the constructed curve from the real curve is minimised. And not when the variance of the constructed curve from the real curve is minimised. What do I get wrong?

12. Like I said, read the article. The "covariance" is a measure of how much each of the sets of data deviate from each other. The Least-squares approximation is the best because there is not any other matrix of covariances that is less than the Least Squares covariance matrix.

13. I think there is a tacit assumption you are making that having a small covariance matrix in the sense of the Gauss-Markov Theorem is the kind of optimality that is desired. Gauss-Markov only guarantees that the estimator you get is Best Linear Unbiased (equivalent to minimizing the covariance matrix), not a UMVUE, which seems strictly better. It also sidesteps a question that is in the spirit of what the OP was asking by using a criterion of goodness that is inherently tied up with the variance: why do we use a squared error loss function as opposed to an absolute deviation loss function to evaluate the goodness of fit?

14. Let me rephrase the question:

The best approximation curve (=the cunstructed curve which is closest to the real but unknown curve), is the constructed curve B where the data have the least variance from curve B (least squares method), and not the constructed curve A where the data have the least average absolute deviation from curve A?

If THAT is proven, the proof must have something to do with the normal distribution. And the reason we use the given definition of variance is related to that proof and the normal distribution.
If THAT is false, then why do we use the least squares method and not the other one? Only because of the "possibly multiple solutions":http://en.wikipedia.org/wiki/Least_absolute_deviations?. Then that is the only reason the least squares is chosen, or the previous "THAT" also applies?

And the above has nothing to do with the reason we use the given definition of variance?

15. Look, the definition of the variance has nothing to do with least squares. The variance is just a measure of spread. The absolute deviation is a different measure of spread. The reason more attention is paid to the variance (and squared deviations in general) is because it is easier to work with analytically. Is the extra attention deserved for other reasons? EDIT: Things like the Central-Limit Theorem need suitable conditions on the variance, not on the absolute deviation, which is another thing driving the additional focus on the variance. Least-Squares produces estimators that are optimal in some senses (with or without distributional assumptions, independence, etc). They may or may not be optimal in other senses.

Your criteria for goodness of fit seems misguided. If you fit lines in the plane (simple linear regression situation), and you try to figure out how much on average your fitted line misses, the measure you get will blow up because, at infinity, the two lines will differ by $\infty$ with probability 1.

Page 1 of 2 12 Last