Page 1 of 2 12 LastLast
Results 1 to 15 of 17

Math Help - Why was this definition of variance chosen?

  1. #1
    Newbie
    Joined
    Apr 2008
    Posts
    20

    Why was this definition of variance chosen?

    Why do we use:
    (variance)=(Σ(the difference of each value from the mean)^2)/(number of values)

    and do not use instead the:
    (Σ(the absolute value of the difference of each value from the mean))/(number of values)?

    I guess the because has to do with why the least squares method is the best? Why is the least squares method the best?

    I guess the answer to both questions has to do with that the normal distribution has a standard deviation (and thus also variance) equal to 1. So?

    Please answer as lucidly as possible (consider I have a low IQ). And explain every mathematical symbolism you use. I'd prefer if you explain it with words alone instead of math symbols.
    Follow Math Help Forum on Facebook and Google+

  2. #2
    MHF Contributor
    Prove It's Avatar
    Joined
    Aug 2008
    Posts
    11,492
    Thanks
    1391
    First, least-squares methods depends on variances and covariances, and are "the best" because least-squares ensures that the covariance is the smallest.

    Most people only say they square each deviation to make all the deviations positive, to avoid any information being lost due to cancelling. The other reason, which is why we don't use absolute values, is because when deviations are small (and they should be), squared deviations become even smaller.
    Follow Math Help Forum on Facebook and Google+

  3. #3
    MHF Contributor matheagle's Avatar
    Joined
    Feb 2009
    Posts
    2,763
    Thanks
    5
    Fisher wanted the absolute values.
    The nicer thing about the square instead is that it's differentiable
    Eventually Fisher lost out on the |.|
    Follow Math Help Forum on Facebook and Google+

  4. #4
    Senior Member
    Joined
    Oct 2009
    Posts
    340
    Yeah, as Matheagle said, I think it mainly has to do with the fact that x^2 is differentiable, whereas |x| is not. The work has been done though using |\cdot|; in the case of regression, see Least absolute deviations - Wikipedia, the free encyclopedia.
    Follow Math Help Forum on Facebook and Google+

  5. #5
    Newbie
    Joined
    Apr 2008
    Posts
    20
    I'd bet that it has to do with that the normal distribution has a variance and a standard deviation equal to 1.
    Least squares is better than the absolute values, in order to locate the most probably correct curve, YES OR NO? And if it is better, why?

    Am I alone to get it all connected? Someone must know in here.
    Last edited by ThodorisK; July 5th 2010 at 01:38 PM.
    Follow Math Help Forum on Facebook and Google+

  6. #6
    Senior Member
    Joined
    Oct 2009
    Posts
    340
    Quote Originally Posted by ThodorisK View Post
    I'd bet that it has to do with that the normal distribution has a variance and a standard deviation equal to 1.
    Least squares is better than the absolute values, in order to locate the most probably correct curve, YES OR NO? And if it is better, why?

    Am I alone to get it all connected? Someone must know in here.
    Matheagle and I have already stated why you square the differences instead of take the absolute values: the function f(x) = x^2 is nicer analytically than g(x) = |x|. In particular, f is differentiable. If you don't see why this would be advantageous, then you would probably have to have a better background in mathematical statistics (or just mathematics in general) to understand a satisfactory answer to this question to begin with. The question is inherently mathematical. If you aren't sharp at math, there is no intuitive answer. It has nothing to do with the normal distribution in particular.

    Least Squares Estimators are optimal in the following way: under the usual assumptions (normal distribution, independent identically distributed errors) the estimators of the regression coefficients are UMVUE's. If you drop the distributional assumption, they are BLUE's. The criterion we use to evaluate the "goodness" of unbiased estimators is the variance though, which is the issue you are asking about in the first place.
    Follow Math Help Forum on Facebook and Google+

  7. #7
    Newbie
    Joined
    Apr 2008
    Posts
    20
    Is it impossible to construct the equation of a curve (as the least squares construct it), using the average absolute deviation (or many absolute deviations)? Yes or no?

    If it is possible, then it is a worse approximation of the real one, than the line or the curve constructed by the least squares? Yes or no?
    Follow Math Help Forum on Facebook and Google+

  8. #8
    Senior Member
    Joined
    Oct 2009
    Posts
    340
    1) Yes, it is possible. In fact, I linked to the Wiki Page on this topic in my first post of this thread.

    2) It depends on your definition of "worse".....
    Last edited by theodds; July 5th 2010 at 09:40 PM.
    Follow Math Help Forum on Facebook and Google+

  9. #9
    MHF Contributor
    Prove It's Avatar
    Joined
    Aug 2008
    Posts
    11,492
    Thanks
    1391
    Quote Originally Posted by ThodorisK View Post
    Is it impossible to construct the equation of a curve (as the least squares construct it), using the average absolute deviation (or many absolute deviations)? Yes or no?

    If it is possible, then it is a worse approximation of the real one, than the line or the curve constructed by the least squares? Yes or no?
    The Gauss-Markov Theorem proves that the Least Squares Approximator is the best approximator, because the covariance matrix of a Least Squares approximation is always the smallest possible covariance matrix. So yes, most likely, your approximation will be worse.

    I would suggest you read "MULTIPLE VARIABLE REGRESSION", an article I wrote for the Issue 1 of the Math Help Forum e-zine.

    http://www.mathhelpforum.com/math-he...issue-1-a.html
    Follow Math Help Forum on Facebook and Google+

  10. #10
    Newbie
    Joined
    Apr 2008
    Posts
    20
    Quote Originally Posted by theodds View Post
    It depends on your definition of "worse".....
    By "worse" I mean: The constructed curve which is the worse approximation, has the worse-greater average absolute deviation from the real curve.
    Last edited by ThodorisK; July 5th 2010 at 10:50 PM.
    Follow Math Help Forum on Facebook and Google+

  11. #11
    Newbie
    Joined
    Apr 2008
    Posts
    20
    The Gauss-Markov Theorem says that the least squares is the best approximation because the variance is minimised? But the best approximation is when the average absolute deviation of the constructed curve from the real curve is minimised. And not when the variance of the constructed curve from the real curve is minimised. What do I get wrong?
    Follow Math Help Forum on Facebook and Google+

  12. #12
    MHF Contributor
    Prove It's Avatar
    Joined
    Aug 2008
    Posts
    11,492
    Thanks
    1391
    Like I said, read the article. The "covariance" is a measure of how much each of the sets of data deviate from each other. The Least-squares approximation is the best because there is not any other matrix of covariances that is less than the Least Squares covariance matrix.
    Follow Math Help Forum on Facebook and Google+

  13. #13
    Senior Member
    Joined
    Oct 2009
    Posts
    340
    I think there is a tacit assumption you are making that having a small covariance matrix in the sense of the Gauss-Markov Theorem is the kind of optimality that is desired. Gauss-Markov only guarantees that the estimator you get is Best Linear Unbiased (equivalent to minimizing the covariance matrix), not a UMVUE, which seems strictly better. It also sidesteps a question that is in the spirit of what the OP was asking by using a criterion of goodness that is inherently tied up with the variance: why do we use a squared error loss function as opposed to an absolute deviation loss function to evaluate the goodness of fit?
    Follow Math Help Forum on Facebook and Google+

  14. #14
    Newbie
    Joined
    Apr 2008
    Posts
    20
    Let me rephrase the question:

    The best approximation curve (=the cunstructed curve which is closest to the real but unknown curve), is the constructed curve B where the data have the least variance from curve B (least squares method), and not the constructed curve A where the data have the least average absolute deviation from curve A?

    If THAT is proven, the proof must have something to do with the normal distribution. And the reason we use the given definition of variance is related to that proof and the normal distribution.
    If THAT is false, then why do we use the least squares method and not the other one? Only because of the "possibly multiple solutions":http://en.wikipedia.org/wiki/Least_absolute_deviations?. Then that is the only reason the least squares is chosen, or the previous "THAT" also applies?

    And the above has nothing to do with the reason we use the given definition of variance?
    Last edited by ThodorisK; July 7th 2010 at 01:22 AM.
    Follow Math Help Forum on Facebook and Google+

  15. #15
    Senior Member
    Joined
    Oct 2009
    Posts
    340
    Look, the definition of the variance has nothing to do with least squares. The variance is just a measure of spread. The absolute deviation is a different measure of spread. The reason more attention is paid to the variance (and squared deviations in general) is because it is easier to work with analytically. Is the extra attention deserved for other reasons? EDIT: Things like the Central-Limit Theorem need suitable conditions on the variance, not on the absolute deviation, which is another thing driving the additional focus on the variance. Least-Squares produces estimators that are optimal in some senses (with or without distributional assumptions, independence, etc). They may or may not be optimal in other senses.

    Your criteria for goodness of fit seems misguided. If you fit lines in the plane (simple linear regression situation), and you try to figure out how much on average your fitted line misses, the measure you get will blow up because, at infinity, the two lines will differ by \infty with probability 1.
    Last edited by theodds; July 6th 2010 at 11:49 AM.
    Follow Math Help Forum on Facebook and Google+

Page 1 of 2 12 LastLast

Similar Math Help Forum Discussions

  1. How many 7-card hands can be chosen
    Posted in the Discrete Math Forum
    Replies: 1
    Last Post: October 16th 2011, 12:54 PM
  2. Randomly chosen family
    Posted in the Advanced Statistics Forum
    Replies: 13
    Last Post: October 12th 2011, 10:06 AM
  3. Replies: 1
    Last Post: August 1st 2010, 08:40 PM
  4. Probability of being chosen
    Posted in the Statistics Forum
    Replies: 3
    Last Post: November 24th 2009, 04:38 PM
  5. Replies: 3
    Last Post: May 9th 2007, 10:35 AM

Search Tags


/mathhelpforum @mathhelpforum