Results 1 to 13 of 13

Math Help - A question about Chi-Squared test

  1. #1
    Newbie
    Joined
    May 2011
    Posts
    5

    Question A question about Chi-Squared test

    Hi

    I have done a study and found that the prevalence of a particular disease is higher in my study population (11 out of 429 patients) than the UK population (0.3% prevalence). How would I be able to compare these using Chi-squared test to find out whether the prevalence of the disease is higher in my population than the UK population?

    Thank you
    Follow Math Help Forum on Facebook and Google+

  2. #2
    MHF Contributor
    Joined
    May 2010
    Posts
    1,027
    Thanks
    28
    if there is only 1 disease state (ie, people are "sick" or "not") then you dont need a chi square test for this. I think a test with the normal distribution is sufficient.

    google threw up this (more or less) step by step guide: Statistics Tutorial: Hypothesis Test for a Proportion

    You may want to check its accuracy with your stats textbook before using it for anything important!
    Follow Math Help Forum on Facebook and Google+

  3. #3
    Newbie
    Joined
    May 2011
    Posts
    5
    Thanks, so would you not be able to use Chi-squared to calculate P value in this case? I thought Chi-squared could be used to compare proportions.
    Follow Math Help Forum on Facebook and Google+

  4. #4
    Member
    Joined
    May 2011
    From
    Sacramento, CA
    Posts
    165
    Quote Originally Posted by medicalstats View Post
    Thanks, so would you not be able to use Chi-squared to calculate P value in this case? I thought Chi-squared could be used to compare proportions.
    I don't believe you can use \chi^2 distribution for your test. As this discussion details, among other things, the degree of freedom of the test corresponds to the number of cells in your analysis. Their example is to compare three distributions of fish, call them A, B, and C. They each have certain proportions a, b, and c. The expected proportions (E) were equally 1/3 for each cell. Thus, the test statistic is

    \chi^2 = \frac{(a - E)^2}{E} + \frac{(b - E)^2}{E} + \frac{(c - E)^2}{E}

    The degrees of freedom for this test is given by:

    df = "number\ of\ cells" - 1

    For the example, df = 2. In your case, you could only have a test of cell size = 1. Then your df = 0. How do you do a test with zero degrees of freedom? The answer is you cannot. Now, if you were to split your sample into males and females and have the population expected (empirical?) proportions for males and females. Then you could do the test with cell size = 2 for df = 1. The test is then straight-forward.
    Last edited by bryangoodrich; May 23rd 2011 at 10:09 AM. Reason: corrected latex and added URL
    Follow Math Help Forum on Facebook and Google+

  5. #5
    MHF Contributor
    Joined
    May 2010
    Posts
    1,027
    Thanks
    28
    Quote Originally Posted by bryangoodrich View Post
    I don't believe you can use \chi^2 distribution for your test. As this discussion details, among other things, the degree of freedom of the test corresponds to the number of cells in your analysis. Their example is to compare three distributions of fish, call them A, B, and C. They each have certain proportions a, b, and c. The expected proportions (E) were equally 1/3 for each cell. Thus, the test statistic is

    \chi^2 = \frac{(a - E)^2}{E} + \frac{(b - E)^2}{E} + \frac{(c - E)^2}{E}

    The degrees of freedom for this test is given by:

    df = "number\ of\ cells" - 1

    For the example, df = 2. In your case, you could only have a test of cell size = 1. Then your df = 0. How do you do a test with zero degrees of freedom? The answer is you cannot. Now, if you were to split your sample into males and females and have the population expected (empirical?) proportions for males and females. Then you could do the test with cell size = 2 for df = 1. The test is then straight-forward.

    wouldn't there be two cells (sick, not sick) and 1 degree of freedom in the proposed test?

    Not that i think it is the appropriate test, but it seems feasible enough to me.
    Follow Math Help Forum on Facebook and Google+

  6. #6
    Member
    Joined
    May 2011
    From
    Sacramento, CA
    Posts
    165
    What does it mean to be not-sick when we already have the incidence known? His sample would then be 11/429 and (429-11)/429. Equivalently by proportions, 2.57% and 97.44%. The population proportion is then the pair (0.3%, 99.7%). Let's look at the statistic:

    \chi^2 = \frac{(.0257 - 0.003)^2}{0.003} + \frac{(.9744 - .997)^2}{.997} = 0.1722756

    Using R with 95% confidence, the distribution has quantiles for \chi^2 (1 - \alpha, df) = 3.841, (\alpha = 0.05). The null hypothesis is that the two are the same, and the test statistic falls within the acceptance region (fail to reject). Yet, this doesn't seem right given the drastic difference we observed in sick people (>2% vs 0.3%). Why would this be? The reason is what I alluded to above. It is a false appearance that we gained a degree of freedom by partitioning "sick and nonsick" people. The reason is that the other is wholly determined by the available information. Maybe I'm wrong, though.
    Follow Math Help Forum on Facebook and Google+

  7. #7
    MHF Contributor
    Joined
    May 2010
    Posts
    1,027
    Thanks
    28
    My understanding was that the reason the test statistic has 1 less degree of freedom than the number of cells is that the total for the cells always adds up to 100% of the sample size. ie, the fact that one of the cells is determined by the others is already allowed for when setting the number of degrees of freedom.

    The test may have low power but that does not show it is unfeasible or that its distribution is asymtotically incorrect. I never intended to imply that the test was a good one (as per my first post in this thread, where i drew the OP's attention to an alternative).


    Edit Minor edits were made before I saw the reply below
    Last edited by SpringFan25; May 23rd 2011 at 11:38 AM.
    Follow Math Help Forum on Facebook and Google+

  8. #8
    Member
    Joined
    May 2011
    From
    Sacramento, CA
    Posts
    165
    You may be correct, but aren't the cells supposed to be independent? If one is determined by the other, we don't have that independence. Thus, we really have one estimate and we lose its degree of freedom, making the test impotent. If I am wrong, then your critique is spot on, and my calculations above would be the result.
    Follow Math Help Forum on Facebook and Google+

  9. #9
    MHF Contributor
    Joined
    May 2010
    Posts
    1,027
    Thanks
    28
    this Derivation appears to assume that the probability in being in the cells must sum to 1. That is only the case if we include 2 cells (sick, not sick) in the analysis.
    Last edited by SpringFan25; May 23rd 2011 at 12:27 PM. Reason: fixed ambiguity
    Follow Math Help Forum on Facebook and Google+

  10. #10
    Senior Member
    Joined
    Oct 2009
    Posts
    340
    Of course you can use a chi-square test for this. It's standard. I would be shocked if the usual chi-square test wasn't exactly equal to the square of the usual Z test (by "usual" I mean the one where you use the exact null standard deviation in the denominator, as opposed to estimating it). One degree of freedom, of course. There are implicitly two cells in the data: the successes and the failures. You lose one degree of freedom so you have one left over. Obviously this has to be true since an equivalent Z test can be formed, and squaring the Z gives a chi-square with one degree of freedom.

    It's a little bit misleading to speak of "the" chi-square test. The usual tests - the Wald, score, and likelihood ratio tests - are all chi-square tests. IIRC the chi-square test that most people think of is equivalent to the score test in this particular case. Incidentally, if you invert the score test to get a confidence interval, it turns out to be the same as adding two successes and two failures, which is where that trick comes from for small samples.
    Follow Math Help Forum on Facebook and Google+

  11. #11
    Senior Member
    Joined
    Oct 2009
    Posts
    340
    Quote Originally Posted by bryangoodrich View Post
    What does it mean to be not-sick when we already have the incidence known? His sample would then be 11/429 and (429-11)/429. Equivalently by proportions, 2.57% and 97.44%. The population proportion is then the pair (0.3%, 99.7%). Let's look at the statistic:

    \chi^2 = \frac{(.0257 - 0.003)^2}{0.003} + \frac{(.9744 - .997)^2}{.997} = 0.1722756

    Using R with 95% confidence, the distribution has quantiles for \chi^2 (1 - \alpha, df) = 3.841, (\alpha = 0.05). The null hypothesis is that the two are the same, and the test statistic falls within the acceptance region (fail to reject). Yet, this doesn't seem right given the drastic difference we observed in sick people (>2% vs 0.3%). Why would this be? The reason is what I alluded to above. It is a false appearance that we gained a degree of freedom by partitioning "sick and nonsick" people. The reason is that the other is wholly determined by the available information. Maybe I'm wrong, though.
    Shouldn't you be using expected cell counts, not expected proportions? It makes a huge difference. I also checked in R that this test is equivalent to the Z test and, sure enough, if you square the Z test you get this one. If you replace the proportions with expected counts you get 73.5, so the result is highly significant.

    \displaystyle Z = \frac{\hat p - p_0}{\sqrt{p_0 (1 - p_0) / 429}} = 8.5747 \Rightarrow Z^2 = 73.5

    Similarly

    \displaystyle \chi^2 = \frac{(11 - (.003)429)^2}{(.003)429} + \frac{(418 - (.997)429)^2}{(.997)429} = 73.5
    Follow Math Help Forum on Facebook and Google+

  12. #12
    Member
    Joined
    May 2011
    From
    Sacramento, CA
    Posts
    165
    Thanks for the details. I was about to comment that SpringFan was right, and if we think of it in terms of the Z test we should see the parallel. I don't know why I was using proportions, though. As you pointed out, you're supposed to use the counts, and you aptly show the test comes out significant as we should have expected.
    Follow Math Help Forum on Facebook and Google+

  13. #13
    MHF Contributor matheagle's Avatar
    Joined
    Feb 2009
    Posts
    2,763
    Thanks
    5
    I prefer the Z, which is approximate by the CLT,
    because you can do a one sided test here.
    When you square the test stat it, now a 2 sided test.
    Same with the t and F when you have 1 df.
    Follow Math Help Forum on Facebook and Google+

Similar Math Help Forum Discussions

  1. Chi - squared test
    Posted in the Advanced Statistics Forum
    Replies: 0
    Last Post: November 27th 2011, 10:38 AM
  2. p value on chi squared test??
    Posted in the Advanced Statistics Forum
    Replies: 4
    Last Post: May 11th 2010, 12:48 PM
  3. [Research Statistics] Chi-Squared Test?
    Posted in the Statistics Forum
    Replies: 0
    Last Post: September 26th 2009, 11:40 PM
  4. Pooling in the Chi-squared test?
    Posted in the Advanced Statistics Forum
    Replies: 4
    Last Post: January 16th 2009, 02:52 PM
  5. Relationship between t-test and Chi-squared
    Posted in the Advanced Statistics Forum
    Replies: 1
    Last Post: February 13th 2008, 11:02 PM

Search Tags


/mathhelpforum @mathhelpforum