Calculate significance of difference in count numbers between populations


Nov 2012
probability of count numbers between populations difference


I'm comparing genomic distribution counts of two subsets of a dataset that were composed using different selection criteria.

Both subsets contain x values corresponding to the genomic startpositions of probes used on an array to assess the status of specific sequences on the genome.

subset1 subset2
a: 8281 -- 31225
b: 6323 -- 7853
c: 1397 -- 711
d: 2462 -- 2205
e: 2397 -- 2351
f: 4120 -- 317
g: 12756 -- 2659
h: 12255 -- 2679
total: 50000 -- 49991

The table above lists the number of counts per genomic category (a-h) of the two subsets. Here, subset1 represents a set of startpositions of randomly selected probes and subset 2 a set of probes that were selected according to a specific criterium.
I now want to determine whether the difference in count numbers is significant for categories individually. Basically, I would like to stastically demonstrate that 31225 does, or does not significantly differ from 8281 if the latter number is to be expected for this category when probes are selected at random.

Thanks in advance,

Last edited:


MHF Helper
Sep 2012
Hey pjk.

You can test the difference of proportions from two samples using a two-sample t-test which you might want to consider.