# Thread: Do you need equal amounts of data for a chi square test?

1. ## Do you need equal amounts of data for a chi square test?

I'm trying to see if there is a dependent relationship between whether a school is public or private and its student-teacher ratio. I'm not sure if I can use a chi square test for this because there is like a 6 to 1 ratio between the amounts of public and private schools that I have for my data (way more public schools). Can I chi square test still work?

2. Is the 6:1 the ratio of the sample or the entire population?

Do you know roughly the size of the population? How big is the sample? Is the teacher student ratio normally distributed?

Remember the smaller value for n will increase the probability of occuring a Type II error.

3. Originally Posted by pickslides
Is the 6:1 the ratio of the sample or the entire population?

Do you know roughly the size of the population? How big is the sample? Is the teacher student ratio normally distributed?

Remember the smaller value for n will increase the probability of occuring a Type II error.
I'm honestly not sure exactly how to answer that. I have data from all the schools in California (around 1100 public and almost 200 private; that's where the ratio comes from). I used California because I'd have a reasonable amount of private schools without having to get additional private school data from another state.

4. I think the amount of data you have is sufficient to perform a test on the data.

If the conclusion you want to draw is for 'the population of schools in California' then the good news is you seem to have the entire population data set!?

This means a test is not needed as you can simply calculate each ratio and see if they are the same. The $\displaystyle \chi^2$ test is performed when only a sample is available. Does this make sense?

If you like you can take a sample from each and perform the test. You can also consider a Wilcoxon Ranked Sign Test if you are unsure of the underlying distribution.

5. Originally Posted by pickslides
I think the amount of data you have is sufficient to perform a test on the data.

If the conclusion you want to draw is for 'the population of schools in California' then the good news is you seem to have the entire population data set!?

This means a test is not needed as you can simply calculate each ratio and see if they are the same. The $\displaystyle \chi^2$ test is performed when only a sample is available. Does this make sense?

If you like you can take a sample from each and perform the test. You can also consider a Wilcoxon Ranked Sign Test if you are unsure of the underlying distribution.
I think my problem was I didn't realize that the expected values already compensate for the fact that there are different amounts of data from each type of school, just because of the way they are calculated (multiplying the row sum by the column sum and dividing by the total on a contingency table). I think I know what I'm doing now.

6. I'm also wondering how many bins I should use for the chi square test. I don't know if there's a recommended number based on the amount of data being tested. The student-teacher ratios are generally between 5 and 25.