The definition of -distribution, taken from my statistics book, is:
If are independent and , then
f is the number of degrees of freedom.
The book also says (but it doesn't prove it) that if are independent and , then
I would really like to see what the proof looks like. How can this be proven?
Oct 15th 2009, 06:25 PM
First of all that's a lame definition of a chi-square.
The real definition is
WHERE the dfs need not be an integer.
It's easy to prove that if you square a st normal you get a chi-square with 1 df and then via MGFs you can show that sums of independent chi-squares gives you a chi-square.
NOW, when you subtract the sample mean you do lose that 1 df.
It's not a simple proof and I couldn't find it on the web.
I'm sure it's here and I'll look again.
Oct 16th 2009, 04:00 PM
I just realized it wasn't the definition of chi-square :P It was just a theorem; the definition was some function containing the gamma function, like you wrote. I think the definition was
(the same as that on wikipedia). What I think is kind of strange - my book (our course literature) states a lot of things, but it proves few of them. Another thing that it states but it doesn't prove is that the test variable in the chi-square test is chi-square distributed:
If Z is distributed in r states with probabilities , and is the number of times Z, out of n observations, ended up in state i, then the test variable:
is chi-square(r-1)-distributed (here is the expected number of times Z will end up in state i). The formula however is not motivated, although they prove it is chi-square(r-1) distributed for r = 2. If you look at a single term:
it doesn't look chi-square distributed. If is 1 if Z ends up in state i, and 0 otherwise, has mean and variance . The sum will approximately get distributed by . Now
approximately. That is why I wonder why the formula doesn't look like
instead (which I think should be chi-square(r) or possibly chi-square(r-1) distributed), so for me it looks like someone has forgot a factor in the denominator (although I know that it's not the case). Anyone who knows why it looks as it does and how the formula has been obtained?
No, I know, it's an approximation and only good for large n, or for large E:s. Besides, this distribution is discrete (there is only a finite number of possible outcomes for each n), while the chi-square distribution is continuous. It seems to be difficult to prove that these sums are chi-square distributed, or in the latter case, distributed similarly to the chi-square distribution. I still think it's bad though that we are taught stuff that is not proven. It forces you to trust blindly in different mathematical expressions.