Calculate significance of difference in count numbers between populations

pjk

Nov 2012
1
0
Netherlands
probability of count numbers between populations difference

Hi

I'm comparing genomic distribution counts of two subsets of a dataset that were composed using different selection criteria.

Both subsets contain x values corresponding to the genomic startpositions of probes used on an array to assess the status of specific sequences on the genome.


subset1 subset2
a: 8281 -- 31225
b: 6323 -- 7853
c: 1397 -- 711
d: 2462 -- 2205
e: 2397 -- 2351
f: 4120 -- 317
g: 12756 -- 2659
h: 12255 -- 2679
total: 50000 -- 49991

The table above lists the number of counts per genomic category (a-h) of the two subsets. Here, subset1 represents a set of startpositions of randomly selected probes and subset 2 a set of probes that were selected according to a specific criterium.
I now want to determine whether the difference in count numbers is significant for categories individually. Basically, I would like to stastically demonstrate that 31225 does, or does not significantly differ from 8281 if the latter number is to be expected for this category when probes are selected at random.

Thanks in advance,

Pieter
 
Last edited:

chiro

MHF Helper
Sep 2012
6,608
1,263
Australia
Hey pjk.

You can test the difference of proportions from two samples using a two-sample t-test which you might want to consider.