# Thread: Probabilities of Selections from Large Populations

1. ## Probabilities of Selections from Large Populations

Hi everyone, as you can see this is my first post. I should start by saying that my stats is so rusty that I'm not even sure if this is a basic or advanced problem! Anyway, here goes:

I have a very large sample (~30,000,000) of particles of a known size distribution (split into 5 fractions) from which I want to pick a smaller sample (~30,000), and determine the probability of getting various different size distributions. If the original sample of M particles consists of A, B, C, D and E particles in each size fraction, and I am picking m particles, the probability of picking a, b, c, d and e particles is:

$\frac{^AC_a \times ^BC_b \times ^CC_c \times ^DC_d \times ^EC_e} { ^MC_m}$

First of all, I hope that this is correct! As I said, my stats is very rusty. The problem with this is that, for example, B is 850,000 and b is 850, which is beyond the realms of Excel's calculations. Even worse, B<C<D<E!

So, my question is: is there an approximate method I could use in this case that would avoid these enormous numbers?

I hope I have explained my problem properly! Thank you in advance for any help you can give.

2. Originally Posted by hyperchondriac
Hi everyone, as you can see this is my first post. I should start by saying that my stats is so rusty that I'm not even sure if this is a basic or advanced problem! Anyway, here goes:

I have a very large sample (~30,000,000) of particles of a known size distribution (split into 5 fractions) from which I want to pick a smaller sample (~30,000), and determine the probability of getting various different size distributions. If the original sample of M particles consists of A, B, C, D and E particles in each size fraction, and I am picking m particles, the probability of picking a, b, c, d and e particles is:

$\frac{^AC_a \times ^BC_b \times ^CC_c \times ^DC_d \times ^EC_e} { ^MC_m}$

First of all, I hope that this is correct! As I said, my stats is very rusty. The problem with this is that, for example, B is 850,000 and b is 850, which is beyond the realms of Excel's calculations. Even worse, B<C<D<E!

So, my question is: is there an approximate method I could use in this case that would avoid these enormous numbers?

I hope I have explained my problem properly! Thank you in advance for any help you can give.
I think what you have to do is think about what you want to do with these probabilities, then you should be able to use an appropriate approximation.

CB

3. Originally Posted by CaptainBlack
I think what you have to do is think about what you want to do with these probabilities, then you should be able to use an appropriate approximation.

CB
Hi, thanks for the reply - excuse my slowness, but you've been a little too cryptic for me! I'm not asking for the answer, but a more explicit hint wouldn't go amiss, if you don't mind!

Thanks