# Thread: Rare events - how to distinguish between zero probabilities?

1. ## Rare events - how to distinguish between zero probabilities?

Hello,

I am trying to estimate probabilities of a rare event, say presence of mutated species in a couple of hundred different units of varying population. I need to calculate probabilities of mutation for all units. So if there is a unit with 1000 species where there are 4 mutated individuals, then my probability of presence of mutated species is 4/1000=1/250. But since my event is rare that most of the units would have probability zero. Which does tell me something (it's unlikely) - but really are these zeros equal or can I extract some additional information to distinguish between units?

Consider 2 units - A and B. In A there are 100000 species and no mutants, in B 1000 species and no mutants. My hunch is that in fact one can say that probability of mutation is less likely in A - after all there are more potential mutants there than in B.

How to approach it? My idea is that maybe I could put boundaries somehow. Let's imagine that the next species present in the unit is a mutant. In A_1 we'd have 100001 species, one of which is a mutant and in B_1 we'd have 1001 species, one of which mutanted. So the new probabilities are

P(mutation in A_1)=1/100001
P(mutation in B_1)=1/1001

Is that a reasonable approach? So could I say that the probability is not larger than 1/100001 in A and not larger than 1/1001 in B? Let's take some arbitrary point, say half between zero and the upper boundary and say that out probabilities are 1/200002 and 1/2001 respectively. Or would it make sense to include some other measure?

This sounds like a problem that has been encountered many times before, I'd very much appreciate any hints!

2. ## Re: Rare events - how to distinguish between zero probabilities?

Yes, it does give you more information. If the two samples are independent, then it tells that the event is statistically $100000/1000 = 100$ times more likely to occur in A than in B (it's all you can infer from your data).

Let the probability of a species being mutated in sample A be $\alpha$, and the probability of a species being mutated in sample B be $\beta$. Then with a sample of $n$ species, the expected number of mutated species is $\alpha * n$ in A, and $\beta * n$ in B. So in your case, you would expect the number of mutated species in A to be $100000 \alpha$, and in B to be $1000 \beta$. Thus:

$E_A(X) = 100000 \alpha$
$E_B(X) = 1000 \beta$

In your case you observed $E_A(X) = 0$ and $E_B(X) = 0$, which means that:

$100000 \alpha = 1000 \beta$

Therefore $100 \alpha = \beta$

Note that this, however, tells you nothing about actual probability of the event in both cases - only the relative probabilities of it occurring in each sample. As far as you know, $\alpha = \beta = 0$ - you don't have enough information.

This is quite paradoxal because if you turn the problem around and consider nonmutated species instead, then in that case the samples would be statistically indistinguishable.

3. ## Re: Rare events - how to distinguish between zero probabilities?

Bacterius, thanks for such a quick answer. Meanwhile, I came across something called the Rule of Three (Hanley 1983), which was originally applied to estimate probabilities of failures in medical procedures.

It says that "if none of n patients showed the event about which we are concerned, we can be 95% confident that the chance of this event is at most 3 in n (i.e. 3/n). In other words, the upper 95% confidence limit of a 0/n rate is approximately 3/n."

http://www.ncbi.nlm.nih.gov/pmc/arti...00608-0045.pdf

That seems like one possible approach, right?

4. ## Re: Rare events - how to distinguish between zero probabilities?

Originally Posted by kaspers
Bacterius, thanks for such a quick answer. Meanwhile, I came across something called the Rule of Three (Hanley 1983), which was originally applied to estimate probabilities of failures in medical procedures.

It says that "if none of n patients showed the event about which we are concerned, we can be 95% confident that the chance of this event is at most 3 in n (i.e. 3/n). In other words, the upper 95% confidence limit of a 0/n rate is approximately 3/n."

http://www.ncbi.nlm.nih.gov/pmc/arti...00608-0045.pdf

That seems like one possible approach, right?
Bayes' theorem with uniform priors (or whatever you think is more appropriate, usually the uninformative prior -Google for it)

CB