I am trying to estimate probabilities of a rare event, say presence of mutated species in a couple of hundred different units of varying population. I need to calculate probabilities of mutation for all units. So if there is a unit with 1000 species where there are 4 mutated individuals, then my probability of presence of mutated species is 4/1000=1/250. But since my event is rare that most of the units would have probability zero. Which does tell me something (it's unlikely) - but really are these zeros equal or can I extract some additional information to distinguish between units?
Consider 2 units - A and B. In A there are 100000 species and no mutants, in B 1000 species and no mutants. My hunch is that in fact one can say that probability of mutation is less likely in A - after all there are more potential mutants there than in B.
How to approach it? My idea is that maybe I could put boundaries somehow. Let's imagine that the next species present in the unit is a mutant. In A_1 we'd have 100001 species, one of which is a mutant and in B_1 we'd have 1001 species, one of which mutanted. So the new probabilities are
P(mutation in A_1)=1/100001
P(mutation in B_1)=1/1001
Is that a reasonable approach? So could I say that the probability is not larger than 1/100001 in A and not larger than 1/1001 in B? Let's take some arbitrary point, say half between zero and the upper boundary and say that out probabilities are 1/200002 and 1/2001 respectively. Or would it make sense to include some other measure?
This sounds like a problem that has been encountered many times before, I'd very much appreciate any hints!