1. ## Overlapping Distributions

I have a box of small components and a box of large components. The mean of the small components is less than the mean of the large ones, but the distributions overlap.

Question: If I pick one component from each box at random, what is the probability that the component from the box of "small" parts will be larger than the one from the box of "large" parts?

This is the method I'm planning to try (in Excel), is it correct? Is there an easier way?

1. tabulate an incrementing series of sizes.
2. from the mean and standard deviation, calculate the probability of the small component being equal to each size tabulated
3. calculate the probability of the large component being less than the size above
4. multiply 2 & 3 to get the probability of both simultaneously
5. use simpsons rule to integrate the area of the product in 4 above for all sizes in the series (assuming that the table covers a range extending beyond several standard deviations)

The alternative I'm thinking of is to use the probability of the small component being larger than each size tabulated in step 2 above, but that intuitively feels like I would be counting events multiple times.

Thanks.

2. ## Re: Overlapping Distributions

Originally Posted by anthropomorphous
I have a box of small components and a box of large components. The mean of the small components is less than the mean of the large ones, but the distributions overlap.

Question: If I pick one component from each box at random, what is the probability that the component from the box of "small" parts will be larger than the one from the box of "large" parts?

This is the method I'm planning to try (in Excel), is it correct? Is there an easier way?

1. tabulate an incrementing series of sizes.
2. from the mean and standard deviation, calculate the probability of the small component being equal to each size tabulated
3. calculate the probability of the large component being less than the size above
4. multiply 2 & 3 to get the probability of both simultaneously
5. use simpsons rule to integrate the area of the product in 4 above for all sizes in the series (assuming that the table covers a range extending beyond several standard deviations)

The alternative I'm thinking of is to use the probability of the small component being larger than each size tabulated in step 2 above, but that intuitively feels like I would be counting events multiple times.

Thanks.
By selecting two components you now have a two dimensional probability distribution. Assuming the selection of one component is independent of the other this new distribution will be the product of the two individual distributions, i.e.

$p_{SL}(s,l)=p_S(s)p_L(l)$ where

$p_S(s) and p_L(l)$ are the individual distributions of the small and large components.

Given this your probability of the smaller component being larger than the large component is just the 2D integral of the product distribution over the area where s > l, i.e. the area below the line s=l.

for example in the image above you integrate that 2D Gaussian over the blue area.

Depending on what your individual distributions are you may be able to use functions that exist in Excel to do the integration directly. Otherwise you have to somehow make a table of the 2D distribution and apply your favorite numerical integration method.

3. ## Re: Overlapping Distributions

Thanks Romsek, I quickly discovered that my way wasn't giving the right answer when I tried it.

Originally Posted by romsek
for example in the image above you integrate that 2D Gaussian over the blue area.

Depending on what your individual distributions are you may be able to use functions that exist in Excel to do the integration directly. Otherwise you have to somehow make a table of the 2D distribution and apply your favorite numerical integration method.
Just to clarify, I assume you mean the volume under the blue area, and not the area of the blue surface? I'm assuming that a Gaussian distribution is the most appropriate, in the absence of any information other than some measurements taken from samples. I can't see which Excel functions would do the job directly. The best I can think of is to divide the volume into thin slices, use NORMDIST to calculate the area of each slice, multiply each area by the relevant ordinate from the other sample, and finally integrate using Simpson. Is there a shorter way?

I need to repeat the process for 10 or 20 different separations between the means, so anything shorter than constructing ten different tables would be welcome!

Cheers.

4. ## Re: Overlapping Distributions

Yes, I just mean the blue area is the area you integrate over.