Here's how I grouped the data for the Chi square test to keep at least 5 observations in each cell:
Low and high RNA concentrations produce more positive tests. Whether this is statistically significant should be tested with a Chi square test.Code:RNA Positive Negative # Pct # Pct <3 7 .58 5 .42 >3 <9 8 .27 22 .73 >9 6 .43 8 .57 Total 21 .375 35 .625
To do this test you calculate an expected frequency for each cell using the total percentages. Example: for the cell "Positive <3" the expected is Then the Chi square statistic is the sum over the cells of
For cell "Positive <3" this is The sum over all the cells gives the Chi square statisitic of 3.9. It has degrees of freedom 6 - 1 - 3 = (3 - 1)(2 - 1) = 2 because 3 parameters (2 for the rows and 1 for the columns) are being estimated. The P-value of 3.9 with 2 DF is .14, not significant. (EDIT: I corrected the DF from 4 to 2 and the P-value from .42 to .14 per CaptainBlack's post below.)
But I think the relationship is stronger at the high end than the grouped data show. You can see this if you sort the data by RNA concentration. (I haven't shown this here; it is better in color in the spreadsheet.)
To show this relationship, I suggest a logit or probit analysis. With these, you test whether there is a linear or U-shaped relationship between the probability of a positive test and RNA concentration. For either of these analyses, you don't have the grouping restrictions of the Chi square test. But you need software such as SAS or SPSS for this.