Originally Posted by

**jstandard** Hello,

I'm working through a proof of the coefficient of linear regression (r) from its verbose form to its concise one. I realize the concept is statistics, but the the proof seems more algebra and possibly calculus-based.

Verbose Form:

$\displaystyle

r = \frac{n\sum xy - \sum x \sum y}{\sqrt{n\sum x^2 - (\sum x)^2} \sqrt{n\sum y^2 - (\sum y)^2}}

$

Concise Form:

$\displaystyle

r = \frac{\sum (z_{x} z_{y})}{n-1}

$

It seems easiest to work backward from the Concise Form to the Verbose one, and to do so I'm using the following definitions:

$\displaystyle

z_{x} = \frac{x - \bar{x}}{s_{x}}

$

$\displaystyle

\bar{x} = \frac{\sum x}{n}

$

$\displaystyle

s_{x} = \sqrt{\frac{n\sum x^2 - (\sum x)^2}{n(n-1)}}

$

Doing basic substitution in the Verbose Form I get (didn't substitute for $\displaystyle s_{x}$ or $\displaystyle s_{y}$ to keep some semblance of readability):

$\displaystyle

r = \frac{\sum (\frac{x - \frac{\sum x}{n}}{s_{x}} * \frac{y - \frac{\sum y}{n}}{s_{y}})}{n-1}

$

I'm kind of stuck on what to do with the numerator, which seems to result in distributing a summation to other summations. Since x and y are two "paired" sets n will be the same for all summations and also a constant.

To simplify my question:

Am I able to distribute the $\displaystyle \sum$ like so?:

$\displaystyle \sum (\frac{x - \frac{\sum x}{n}}{s_{x}})$ -> $\displaystyle \sum(\frac{\frac{nx - \sum x}{n}}{s_{x}})$ -> $\displaystyle \frac{\frac {n\sum x - \sum \sum x}{n}}{\sum s_{x}}$

If so, what would the term $\displaystyle \sum \sum x$ resolve to?

Note: I don't want anyone to solve the proof here, I'm just trying to understand how I might be able to resolve the summations. I'd like to work through the proof myself to understand how it works.

Whew, okay first time using LaTeX that took a lot out of me...