# Calculation of probability that two sets of outcomes are always disjoint

Show 40 post(s) from this thread on one page
Page 2 of 2 First 12
• Apr 1st 2013, 07:29 AM
majamin
Re: Calculation of probability that two sets of outcomes are always disjoint
I think you are correct, but the probability would go to zero if the calculation was taken that far (I stopped at at 14 for the size of set A). The probabilities are weighted, so that is why the sum equals to 1, however, on second thought, the calculation for P(A is n) is more complicated than I first thought. I think the original response I had is probably the simplest (that is, the calculations for P(B is not symbols in A | n symbols in A) in the table above. Other than that, I think I've exhausted the knowledge that can actually help you (hopefully it helped a little bit).
• Apr 2nd 2013, 04:16 AM
noplacebos
Re: Calculation of probability that two sets of outcomes are always disjoint
Majamin, once more, thank you very much for taking the time to propose a solution for my problem, it is much appreciated. If nothing else, it helped me formulate the mathematical problem more precisely.
In the meantime I got a solution from another source, I have tried it on some data and so far it gives logical probabilities.
It is more complex than we both thought - no way I would have come to that formula myself. I will give it here by courtesy and for future users who might find it useful:

A: m draws, r unique outcomes
B: n draws, s unique outcomes

What we want is the the simple probability that m draws from A and n draws from B (all with replacement) produce disjoint sets of outcomes, regardless of what r and s happen to be.
Let's call this Pdisjoint(N,m,n) (with N=20, "number of balls").
This assesses the likelihood of a certain 'null hypothesis', that urns A and B do in fact contain all the integers [1,20] in equal numbers, against a hypothesis that better fits the data, that urns A and B actually contain disjoint sets of numbers.

The most straightforward way to find Pdisjoint(N,m,n) is to sum probablities over all possible r:
(sorry for the format, I can't seem to be able to paste the equation from Word...)

Pdisjoint(N,m,n) = ( SUM(r=0, N) (combine(N, r)*{m,r}*r!*(N-r)n ) / Nm+n

Here,
combine(N, r) is the number of ways to choose r of the N numbers to be the outcomes of the drawings from urn A,
{m,r} is a Stirling number of the second kind and counts the number of ways that the m outcomes can take the r chosen values assuring that each value is chosen at least once (multiplied by r! because order matters),
(N-r)n is the number of ways to chose the n balls from urn B so that they are disjoint from the numbers drawn form urn A, and
Nm+n is the number of ways to choose m balls from urn A and n balls from urn B without regard for disjointness.

Simplifying this expression is not possible due to the nature of Stirling numbers, and it is a real pain to implement in a programming language (java) as the numbers involved are huge (namely, the factorials) and cannot be handled by primitive types (special methods exist for that but there are still a lot of different number types to take into account).

The calculated probability is - oh, by magic - the same whether we take Pdisjoint(N,m,n) or Pdisjoint(N,n,m), however the result is different for m1 != m2, n1 != n2, with m1+n1=m2+n2 ('!=' stands for different). Hence the cumulative number of draws is not determinant, the specific numbers of draws from A and B are.

There we go...
Show 40 post(s) from this thread on one page
Page 2 of 2 First 12