# Degrees of freedom

• Oct 14th 2009, 06:02 AM
TriKri
Degrees of freedom
Hi! Can someone explain this to me:

The definition of $\chi^2$-distribution, taken from my statistics book, is:

Quote:

If $X_1,\ X_2,... ,\ X_f$ are independent and $~N(0,\ 1)$, then

$\sum_{i=1}^f X_i^2\ \sim\chi^2(f)$

f is the number of degrees of freedom.

The book also says (but it doesn't prove it) that if $X_1,\ X_2,... ,\ X_n$ are independent and $~N(0,\ 1)$, then

$\sum_{i=1}^n (X_i-\bar{X})^2\ \sim\chi^2(n-1),$ where $\bar{X}=\frac{X_1+X_2+...+X_n}{n}$

I would really like to see what the proof looks like. How can this be proven?
• Oct 15th 2009, 05:25 PM
matheagle
First of all that's a lame definition of a chi-square.

The real definition is $\chi^2_v=\Gamma(v/2,2)$

WHERE the dfs need not be an integer.

It's easy to prove that if you square a st normal you get a chi-square with 1 df and then via MGFs you can show that sums of independent chi-squares gives you a chi-square.

NOW, when you subtract the sample mean you do lose that 1 df.
It's not a simple proof and I couldn't find it on the web.
I'm sure it's here and I'll look again.
• Oct 16th 2009, 03:00 PM
TriKri
I just realized it wasn't the definition of chi-square :P It was just a theorem; the definition was some function containing the gamma function, like you wrote. I think the definition was

$
f(x;k)=
\begin{cases}\displaystyle
\frac{1}{2^{k/2}\Gamma(k/2)}\,x^{(k/2) - 1} e^{-x/2}&\text{for }x>0,\\
0&\text{for }x\le0,
\end{cases}
$

(the same as that on wikipedia). What I think is kind of strange - my book (our course literature) states a lot of things, but it proves few of them. Another thing that it states but it doesn't prove is that the test variable in the chi-square test is chi-square distributed:

If Z is distributed in r states with probabilities $p_1,\ p_2,\ ...\ ,\ p_r$, and $X_i$ is the number of times Z, out of n observations, ended up in state i, then the test variable:

$Q=\sum_{i=1}^r \frac{(X_i-np_i)^2}{np_i}=\sum_{i=1}^r \frac{(X_i-E_i)^2}{E_i}$

is chi-square(r-1)-distributed (here $E_i$ is the expected number of times Z will end up in state i). The formula however is not motivated, although they prove it is chi-square(r-1) distributed for r = 2. If you look at a single term:

$\frac{(X_i-E_i)^2}{E_i}$

it doesn't look chi-square distributed. If $Y_i$ is 1 if Z ends up in state i, and 0 otherwise, $y_{i,j}$ has mean $p_i$ and variance $p_i(1-p_1)$. The sum $\sum_{j=1}^n y_{i,j} = X_i$ will approximately get distributed by $N(np_i,\ np_i(1-p_1)) = N(E_i,\ E_i(1-p_i))$. Now

$\frac{(X_i-E_i)^2}{E_i(1-p_i)}\sim\chi^2(1)$

approximately. That is why I wonder why the formula doesn't look like

$Q=\sum_{i=1}^r \frac{(X_i-E_i)^2}{E_i(1-p_i)}$

instead (which I think should be chi-square(r) or possibly chi-square(r-1) distributed), so for me it looks like someone has forgot a factor in the denominator (although I know that it's not the case). Anyone who knows why it looks as it does and how the formula has been obtained?
• Oct 16th 2009, 03:20 PM
matheagle
I figured it was just a theorem.
But I don't have your book in front of me.
I teach out of wackerly and walpole all the time.
The second thing you posted is the pearson goodness of fit test.
http://en.wikipedia.org/wiki/Pearson's_chi-square_test
It is important to note that this is not exactly a chi-square as mentioned in that link
'the distribution of the test statistic is not exactly that of a chi-square random variable'
This link seems reasonable too...http://www.statsdirect.com/help/chi_...s/chi_good.htm
Note that the denominators are all different, see
http://www.stat.wisc.edu/~mchung/tea.../lecture24.pdf
• Oct 17th 2009, 02:54 AM
TriKri
No, I know, it's an approximation and only good for large n, or for large E:s. Besides, this distribution is discrete (there is only a finite number of possible outcomes for each n), while the chi-square distribution is continuous. It seems to be difficult to prove that these sums are chi-square distributed, or in the latter case, distributed similarly to the chi-square distribution. I still think it's bad though that we are taught stuff that is not proven. It forces you to trust blindly in different mathematical expressions.