# Degrees of freedom

• Oct 14th 2009, 06:02 AM
TriKri
Degrees of freedom
Hi! Can someone explain this to me:

The definition of $\displaystyle \chi^2$-distribution, taken from my statistics book, is:

Quote:

If $\displaystyle X_1,\ X_2,... ,\ X_f$ are independent and $\displaystyle ~N(0,\ 1)$, then

$\displaystyle \sum_{i=1}^f X_i^2\ \sim\chi^2(f)$

f is the number of degrees of freedom.

The book also says (but it doesn't prove it) that if $\displaystyle X_1,\ X_2,... ,\ X_n$ are independent and $\displaystyle ~N(0,\ 1)$, then

$\displaystyle \sum_{i=1}^n (X_i-\bar{X})^2\ \sim\chi^2(n-1),$ where $\displaystyle \bar{X}=\frac{X_1+X_2+...+X_n}{n}$

I would really like to see what the proof looks like. How can this be proven?
• Oct 15th 2009, 05:25 PM
matheagle
First of all that's a lame definition of a chi-square.

The real definition is $\displaystyle \chi^2_v=\Gamma(v/2,2)$

WHERE the dfs need not be an integer.

It's easy to prove that if you square a st normal you get a chi-square with 1 df and then via MGFs you can show that sums of independent chi-squares gives you a chi-square.

NOW, when you subtract the sample mean you do lose that 1 df.
It's not a simple proof and I couldn't find it on the web.
I'm sure it's here and I'll look again.
• Oct 16th 2009, 03:00 PM
TriKri
I just realized it wasn't the definition of chi-square :P It was just a theorem; the definition was some function containing the gamma function, like you wrote. I think the definition was

$\displaystyle f(x;k)= \begin{cases}\displaystyle \frac{1}{2^{k/2}\Gamma(k/2)}\,x^{(k/2) - 1} e^{-x/2}&\text{for }x>0,\\ 0&\text{for }x\le0, \end{cases}$

(the same as that on wikipedia). What I think is kind of strange - my book (our course literature) states a lot of things, but it proves few of them. Another thing that it states but it doesn't prove is that the test variable in the chi-square test is chi-square distributed:

If Z is distributed in r states with probabilities $\displaystyle p_1,\ p_2,\ ...\ ,\ p_r$, and $\displaystyle X_i$ is the number of times Z, out of n observations, ended up in state i, then the test variable:

$\displaystyle Q=\sum_{i=1}^r \frac{(X_i-np_i)^2}{np_i}=\sum_{i=1}^r \frac{(X_i-E_i)^2}{E_i}$

is chi-square(r-1)-distributed (here $\displaystyle E_i$ is the expected number of times Z will end up in state i). The formula however is not motivated, although they prove it is chi-square(r-1) distributed for r = 2. If you look at a single term:

$\displaystyle \frac{(X_i-E_i)^2}{E_i}$

it doesn't look chi-square distributed. If $\displaystyle Y_i$ is 1 if Z ends up in state i, and 0 otherwise, $\displaystyle y_{i,j}$ has mean $\displaystyle p_i$ and variance $\displaystyle p_i(1-p_1)$. The sum $\displaystyle \sum_{j=1}^n y_{i,j} = X_i$ will approximately get distributed by $\displaystyle N(np_i,\ np_i(1-p_1)) = N(E_i,\ E_i(1-p_i))$. Now

$\displaystyle \frac{(X_i-E_i)^2}{E_i(1-p_i)}\sim\chi^2(1)$

approximately. That is why I wonder why the formula doesn't look like

$\displaystyle Q=\sum_{i=1}^r \frac{(X_i-E_i)^2}{E_i(1-p_i)}$

instead (which I think should be chi-square(r) or possibly chi-square(r-1) distributed), so for me it looks like someone has forgot a factor in the denominator (although I know that it's not the case). Anyone who knows why it looks as it does and how the formula has been obtained?
• Oct 16th 2009, 03:20 PM
matheagle
I figured it was just a theorem.
But I don't have your book in front of me.
I teach out of wackerly and walpole all the time.
The second thing you posted is the pearson goodness of fit test.
http://en.wikipedia.org/wiki/Pearson's_chi-square_test
It is important to note that this is not exactly a chi-square as mentioned in that link
'the distribution of the test statistic is not exactly that of a chi-square random variable'