1. Properties of Summations

Hello,

I'm working through a proof of the coefficient of linear regression (r) from its verbose form to its concise one. I realize the concept is statistics, but the the proof seems more algebra and possibly calculus-based.

Verbose Form:
$
r = \frac{n\sum xy - \sum x \sum y}{\sqrt{n\sum x^2 - (\sum x)^2} \sqrt{n\sum y^2 - (\sum y)^2}}
$

Concise Form:
$
r = \frac{\sum (z_{x} z_{y})}{n-1}
$

It seems easiest to work backward from the Concise Form to the Verbose one, and to do so I'm using the following definitions:

$
z_{x} = \frac{x - \bar{x}}{s_{x}}
$

$
\bar{x} = \frac{\sum x}{n}
$

$
s_{x} = \sqrt{\frac{n\sum x^2 - (\sum x)^2}{n(n-1)}}
$

Doing basic substitution in the Verbose Form I get (didn't substitute for $s_{x}$ or $s_{y}$ to keep some semblance of readability):
$
r = \frac{\sum (\frac{x - \frac{\sum x}{n}}{s_{x}} * \frac{y - \frac{\sum y}{n}}{s_{y}})}{n-1}
$

I'm kind of stuck on what to do with the numerator, which seems to result in distributing a summation to other summations. Since x and y are two "paired" sets n will be the same for all summations and also a constant.

To simplify my question:

Am I able to distribute the $\sum$ like so?:
$\sum (\frac{x - \frac{\sum x}{n}}{s_{x}})$ -> $\sum(\frac{\frac{nx - \sum x}{n}}{s_{x}})$ -> $\frac{\frac {n\sum x - \sum \sum x}{n}}{\sum s_{x}}$

If so, what would the term $\sum \sum x$ resolve to?

Note: I don't want anyone to solve the proof here, I'm just trying to understand how I might be able to resolve the summations. I'd like to work through the proof myself to understand how it works.

Whew, okay first time using LaTeX that took a lot out of me...

2. Originally Posted by jstandard
Hello,

I'm working through a proof of the coefficient of linear regression (r) from its verbose form to its concise one. I realize the concept is statistics, but the the proof seems more algebra and possibly calculus-based.

Verbose Form:
$
r = \frac{n\sum xy - \sum x \sum y}{\sqrt{n\sum x^2 - (\sum x)^2} \sqrt{n\sum y^2 - (\sum y)^2}}
$

Concise Form:
$
r = \frac{\sum (z_{x} z_{y})}{n-1}
$

It seems easiest to work backward from the Concise Form to the Verbose one, and to do so I'm using the following definitions:

$
z_{x} = \frac{x - \bar{x}}{s_{x}}
$

$
\bar{x} = \frac{\sum x}{n}
$

$
s_{x} = \sqrt{\frac{n\sum x^2 - (\sum x)^2}{n(n-1)}}
$

Doing basic substitution in the Verbose Form I get (didn't substitute for $s_{x}$ or $s_{y}$ to keep some semblance of readability):
$
r = \frac{\sum (\frac{x - \frac{\sum x}{n}}{s_{x}} * \frac{y - \frac{\sum y}{n}}{s_{y}})}{n-1}
$

I'm kind of stuck on what to do with the numerator, which seems to result in distributing a summation to other summations. Since x and y are two "paired" sets n will be the same for all summations and also a constant.

To simplify my question:

Am I able to distribute the $\sum$ like so?:
$\sum (\frac{x - \frac{\sum x}{n}}{s_{x}})$ -> $\sum(\frac{\frac{nx - \sum x}{n}}{s_{x}})$ -> $\frac{\frac {n\sum x - \sum \sum x}{n}}{\sum s_{x}}$

If so, what would the term $\sum \sum x$ resolve to?

Note: I don't want anyone to solve the proof here, I'm just trying to understand how I might be able to resolve the summations. I'd like to work through the proof myself to understand how it works.

Whew, okay first time using LaTeX that took a lot out of me...
For your double sum, what is the indexing?

3. It actually doesn't list the indexing in the formula definition (at least in the text I'm using), just $\sum$

I would guess it would have to be: $\sum_{i=1}^{n} \sum_{i=1}^{n} x_{i}$ since all of the calculations are done for 2 sets of values, both sets having n number of values (since they're paired).

I should mention, I'm not even certain if my formula progression is correct and I'm able to "distribute" the summation in that manner.

4. The sum $\sum_{i=1}^nx_i$ has only n as a free variable. Therefore the sum $\sum_1^n\sum_{i=1}^n x_i$ is just $n \sum_{i=1}^n x_i$. But I don't think that's what this sum actually is. Note that you may NOT (!!) distibute a sum across a quotient vis. $\sum \frac ab \neq {\sum a \over \sum b}$, which it looks like you've done.

I'm confused about what the summation is over in $s_x$. There do not appear to be any free variables under the summation sign.

Thanks, that makes sense about the not being able to distribute the sum across a fraction.

For the summation in the denom, $s_{x} = \sqrt{\frac{n\sum_{i=1}^n x_i^2 - (\sum_{i=1}^n x_i)^2}{n(n-1)}}$

6. Ah, I see. What text are you working out of?

7. The text is: Elementary Statistics 11th ed, Mario F. Triola.

All of the summations in the book are listed simply as $\sum$ without any indexing.

The original formulas are the ones I have above in the "Concise" and "Verbose" forms, where I'm trying to track the proof from "Concise" to "Verbose" using whatever manipulations possible.

I should mention I spoke with a tutor at school who also came to the conclusion that the double summation term resolves to $n \sum_{i=1}^n x_i$, but that presents a problem because then you'd have:
$n\sum_{i=1}^n x_i - \sum _{i=1}^n \sum _{i=1}^n x_i$ -> $n \sum_{i=1}^n x_i - n \sum_{i=1}^n x_i$ -> = 0. Eliminating the terms you need in the numerator of the "Verbose" form of the equation.

Ha, sorry, I realize the "medium" of communication for this isn't necessarily the greatest, so let me know if I'm not doing a good job explaining things. I'm somewhat in new territory here.