# Pooling in the Chi-squared test?

• Jan 13th 2009, 02:29 PM
AAKhan07
Pooling in the Chi-squared test?
Hi, please take a look at this exam paper:
http://mei.org.uk/files/papers/s307ja_k39h9.pdf
Now scroll down to page 11 of the pdf to see the mark scheme answer.

What I don't understand is why they've pooled the first two groups.
Also, how do you determine how many degrees of freedom are lost? In other words, when you're looking at the table of values for the Chi-squared distribution, how do you know what row to read off? This seems to require knowledge of how many degrees of freedom are lost, can someone remind me of to determine this?
• Jan 13th 2009, 10:50 PM
mr fantastic
There's a rule of thumb that classes need to have an observed frequency of 5 or more to use the chi-squared test. One remedy is to pool. That's why the first two groups were pooled (because the first group has an observed frequency less than 5).

The degrees of freedom is equal to the number of classes (after pooling) minus the number of parameters estimated from the population to calculate the expected frequencies (in this case two - the mean and the standard deviation of the normal distribution) minus 1.

Therefore in this question, df = 6 - 2 - 1 = 3.
• Jan 16th 2009, 01:13 AM
phabreel
Regarding this, does the data have to be put into classes?. For example if the question just gave 22.0, 22.5, 23.0, 23.5, 24.0, 24.5 etc, with their observed frequencies. it'll be useful to know if the chi-square test only uses data when you categorise each of them into classes.
• Jan 16th 2009, 05:21 AM
Constatine11
Its the expected frequencies in a cell that needs to be more than 5 not the observed (and it is also that no more than 20% of the cells should have expected frequencies less than 5)

• Jan 16th 2009, 02:52 PM
mr fantastic
Thanks for that. My muddle.