# Thread: Combination problem - overlapping sets and hypergeometric distribution

1. ## Combination problem - overlapping sets and hypergeometric distribution

ok, so i have 3 sets, each with 4 elements. These 3 sets are subsets of the universal set which is 60 elements.

what i need to figure out is the correct formulae to calculate the percentage of combinations containing 9 elements from the universal that contain at least one element from each of the 3 subsets.

originally i tried to go with C(4,1)*C(4,1)*C(4,1)*C(57,6)/C(60,9) - which was obviously wrong because i got an answer somewhere along the lines of 1.05. Which sucks. because logically i shouldn't get a probability of 1 until C(4,1)*C(4,1)*C(4,1)*C(57,54)/C(60,57). Because in all other cases there is at least one set that has all four of one of the subsets outside the selection.

So i hit the books for a bit and found the hyper geometric distributions algorithm - which the only version i found that deals with anything similar to my problem seems to require the subsets to be equal - every example i've seen is in any case (or at least not overlapping - C(57,6) obviously does overlap with each instance of C(4,1), and that is probably my problem here.) unless operating in the singular. And looks exactly like my algorithm. so nuts. it''s as follows (in case i've got it wrong)

h(x; N, n, k) = [ kCx ] [ N-kCn-x ] / [ NCn ]
N is the total set size eg 60.
n number of objects drawn from the set eg from 60 select 9
l number of possible successes eg 4 for a particular subset
x is how many of selected items must be successes eg 2 of the 5 selected balls must be red.

which is the same algorithm i was using but for the singular, and you just need to calculate for x=0 and subtract the result from 1 to get the probability of at least one member of a particular subset being in the selection. so my problem is i've been looking at this stuff for too long and only keep finding the same stuff i've already tried/figured out, solve easily for the one subset but put all three together and it falls apart. If someone could explain to me how i've gone wrong or perhaps direct me to some further reading material on the subject, that would be grand. At this point i think the solution might be to change the C(4,1)*C(4,1)*C(4,1)*C(57,6)/C(60,9) to C(4,1)*C(4,1)*C(4,1)*C(48,6)/C(60,9) and a recursive algorithm to go through all the permutations of C(4,0<x<5) and then sum the probabilities together. I think i'm wrong and there is a simpler/more elegant solution.

also, sorry for wall of text but if i don't work through the problem progressively like this i tend to end up with a mess that makes no sense at all to anyone but me.

2. ## Re: Combination problem - overlapping sets and hypergeometric distribution

Originally Posted by TheVale
ok, so i have 3 sets, each with 4 elements. These 3 sets are subsets of the universal set which is 60 elements.
what i need to figure out is the correct formulae to calculate the percentage of combinations containing 9 elements from the universal that contain at least one element from each of the 3 subsets.

The real difficulty with this is that we know very little about the three sets themselves.

If they are pairwise disjoint then it is fairly easy.

But what if some combinations of the sets have elements in common?
We would then have to account for each possibility.
I think that would be a nightmare.

3. ## Re: Combination problem - overlapping sets and hypergeometric distribution

they are pairwise disjoint

4. ## Re: Combination problem - overlapping sets and hypergeometric distribution

Originally Posted by TheVale
they are pairwise disjoint
OK. Suppose that $A,~B,~\&~C$ are the pairwise disjoint subsets of four elements. If $A^c$ denotes the subsets of of nine which contain no element of $A$ then there are $\binom{56}{9}$ of them.

Now you need to use inclusion/exclusion to find $A^c\cup B^c\cup C^c$.

Then subtract that from 1.

5. ## Re: Combination problem - overlapping sets and hypergeometric distribution

so C(56,9) is how many combinations have none of a particular subset. the overlap between 2 such combinations Ac and Bc is every combination from Ac that contains no member of B. so i'd have to calculate C(52,9) as combinations that contain no members from A or B to tell me the result of |Ac| intersect |Bc|, the same for |Ac| intersect |Cc| and then C(48,9) to calculate teh intersection of all three.

so the final equation looks like

1 - ( ( C(56,9) + C(56,9) + C(56,9) - C(52,9) - C(52,9) - C(52,9) + C(48,9) ) / C(60,9) )
1 - ( 3C(56,9) - 3C(52,9) + C(48,9) ) / C(60,9)

this one looks good, assuming i did it correctly - the answer for the above is 9.5% and slightly over 100% for 57 instead of 9. Which seems reasonable and accurate i think. can anyone confirm i've got this right?

6. ## Re: Combination problem - overlapping sets and hypergeometric distribution

I agree with your analysis. Here's a fleshed out discussion and also a direct approach:

7. ## Re: Combination problem - overlapping sets and hypergeometric distribution

Thanks Plato and johng, you were great help.