# Chi Square test, or some other way?

• May 27th 2007, 07:50 PM
bruxism
Chi Square test, or some other way?
hi, here's the problem my girlfriend is having with some genetics she's working on.

As you can see on the spreadsheet attached, A sample tests positive or negative for whatever it is she's testing. She also finds the concentration of some other thing in that given sample.

She wants to know if there is some kind of relationship between Negativeness(or positiveness) and the lowness(or highness) of the concentration.

She's talking about P values and Chi square tests and stuff, but we are not exactly sure how to (or whether we should) apply them to this problem. What would you suggest?

Thanks very much for having a look
• May 27th 2007, 09:54 PM
JakeD
Quote:

Originally Posted by bruxism
hi, here's the problem my girlfriend is having with some genetics she's working on.

As you can see on the spreadsheet attached, A sample tests positive or negative for whatever it is she's testing. She also finds the concentration of some other thing in that given sample.

She wants to know if there is some kind of relationship between Negativeness(or positiveness) and the lowness(or highness) of the concentration.

She's talking about P values and Chi square tests and stuff, but we are not exactly sure how to (or whether we should) apply them to this problem. What would you suggest?

Thanks very much for having a look

Hi. A Chi square test could and should be done. However, the test requires that there be at least 5 observations in each cell. That may obscure the relationship.

Here's how I grouped the data for the Chi square test to keep at least 5 observations in each cell:
Code:

 RNA        Positive  Negative             #  Pct    #  Pct <3          7  .58    5  .42 >3 <9        8  .27    22  .73 >9          6  .43    8  .57 Total      21 .375    35 .625
Low and high RNA concentrations produce more positive tests. Whether this is statistically significant should be tested with a Chi square test.

To do this test you calculate an expected frequency for each cell using the total percentages. Example: for the cell "Positive <3" the expected is $.375(7+5) = 4.5.$ Then the Chi square statistic is the sum over the cells of

$\frac{(actual\ frequency - expected\ frequency)^2}{expected\ frequency}.$

For cell "Positive <3" this is $(7 - 4.5)^2 / 4.5 = 1.39.$ The sum over all the cells gives the Chi square statisitic of 3.9. It has degrees of freedom 6 - 1 - 3 = (3 - 1)(2 - 1) = 2 because 3 parameters (2 for the rows and 1 for the columns) are being estimated. The P-value of 3.9 with 2 DF is .14, not significant. (EDIT: I corrected the DF from 4 to 2 and the P-value from .42 to .14 per CaptainBlack's post below.)

But I think the relationship is stronger at the high end than the grouped data show. You can see this if you sort the data by RNA concentration. (I haven't shown this here; it is better in color in the spreadsheet.)

To show this relationship, I suggest a logit or probit analysis. With these, you test whether there is a linear or U-shaped relationship between the probability of a positive test and RNA concentration. For either of these analyses, you don't have the grouping restrictions of the Chi square test. But you need software such as SAS or SPSS for this.
• May 27th 2007, 11:15 PM
CaptainBlack
Quote:

Originally Posted by JakeD
For cell "Positive <3" this is $(7 - 4.5)^2 / 4.5 = 1.39.$ The sum over all the cells gives the Chi square statisitic of 3.9. It has degrees of freedom 6 - 2 = 4. The P-value of 3.9 with 4 DF is .42, not at all significant.

Now your calculation of the number of degrees of freedom for this left me
uneasy, but I don't do cross tabular analysis every day so I look this up when
I need it. It appears that DF=(rows-1)*(columns -1) = 2*1 = 2.

A Chi-Square of 3.9 is still not significant with this number of degrees of
freedom.

RonL
• May 28th 2007, 12:05 AM
bruxism
thanks for the replies. That's made things a little more clear for us.

a few questions though

1. In regards to performing a chi square test on something like this, it seems the groups are being made somewhat at random. Choosing <3, >3 <9, >9 seems ok, but you could have chosen anything couldn't you? how do you decide? it seems you could skew the stats in this way to make it seem like something is occurring....

2. you said "The P-value of 3.9 with 4 DF is .42, not at all significant". How do you calculate whether it is significant or not.

3. The probit and logit analysis sounds like a good idea. Is there a way to perform these without buying additional software...or do you just have to bite the bullet and spend?

Thanks for your time so far, it's been very helpful.
• May 28th 2007, 12:17 AM
JakeD
Quote:

Originally Posted by CaptainBlack
Now your calculation of the number of degrees of freedom for this left me
uneasy, but I don't do cross tabular analysis every day so I look this up when
I need it. It appears that DF=(rows-1)*(columns -1) = 2*1 = 2.

A Chi-Square of 3.9 is still not significant with this number of degrees of
freedom.

RonL

I don't do these every day either and I should have looked it up too. :( I corrected the post. Thank you. :)
• May 28th 2007, 12:20 AM
CaptainBlack
Quote:

Originally Posted by bruxism
2. you said "The P-value of 3.9 with 4 DF is .42, not at all significant". How do you calculate whether it is significant or not.

Either a cumulative chi-squared distribution calculator, or a set of
tables of critical values for the chi-squared distribution.

Below is the help text and example calculation from the system that
I use most frequently, when I'm not using the book of tables next to
my desk.

Code:

> >help chidis chidis is a builtin function.   normaldis(x) : returns the probability that a normally distributed (mean 0, st.dev. 1) is less than x. invnormaldis(p) : is the inverse. chidis(x,n) : chi-distribution with n degrees of freedom. tdis(x,n) : Student's t-distribution with n degrees of freedom. invtdis(p,n) : the inverse. fdis(x,n,m) : f-distribution with n and m degrees of freedom. > >chidis(3.9,2)     0.857726 >chidis(3.9,4)     0.580291 >
• May 28th 2007, 01:02 AM
JakeD
Quote:

Originally Posted by bruxism
thanks for the replies. That's made things a little more clear for us.

a few questions though

1. In regards to performing a chi square test on something like this, it seems the groups are being made somewhat at random. Choosing <3, >3 <9, >9 seems ok, but you could have chosen anything couldn't you? how do you decide? it seems you could skew the stats in this way to make it seem like something is occurring....

I chose integer cutoffs at the high and low end to keep at least 5 observations in each cell. Looking back at the data, I see I didn't look at it too closely. I could have chosen a cutoff of 12 instead of 9 to get exactly 5 observations in both high categories. Doing that would have increased the significance, so it is true you may be able to cook the results somewhat by carefully selecting the categories. However, the Chi square test requires putting the data into categories; you have to do it somehow. Just don't be too cute when selecting the categories.

Quote:

2. you said "The P-value of 3.9 with 4 DF is .42, not at all significant". How do you calculate whether it is significant or not.
Statistical convention says the P-value should be .05 or less to be statistically significant. I calculated the P-value in the spreadsheet using the function CHIDIST(3.9;2). (Note the DF is actually 2 per CaptainBlack and the P-value is .14.)

Quote:

3. The probit and logit analysis sounds like a good idea. Is there a way to perform these without buying additional software...or do you just have to bite the bullet and spend?
I googled logit analysis free software. The first hit was software called EasyReg. I've never used it, but the price is right!

Quote:

Thanks for your time so far, it's been very helpful.
My pleasure.