Only a gathering of 23 people are needed to to make the probability of two of them having the same birthday greater than 50%. This is a well known and surprising result.
In deriving the expression for probability of birthday coincidence in a group of n people, the approach seems always to consider the probability of non-coincidence:
p 364/365 x 363/365 x 362/365 ......x (365-n)/365
n 1 2 3 n
and subtract this from 1 to find p' (coincidence).
Is there a way of deriving the formula by considering the probability of coincidence directly and not having to go via the non-coincidence route?
Jul 17th 2008, 11:53 AM
Yes, but the only way I can think of is much harder to the point of silliness when you have a nice easy way of doing it. This is because it is easy to describe the non-event. There are many many ways to have the event as n gets large.
To use card game naming: You could have just 1 pair with the same birthday, 2 pairs, 3 pairs all the way up to . Then you could have 1 3-of-a-kind, 1 3-of-a-kind plus 1 pair, ....
You would have to add up all the probabilities for the different events, being very careful to never allow the events to overlap.
This is horribly nasty. What's the matter with computing the complement of the event?
There may be some other way to prove it but the standard way is the only general way that I have ever seen (including some quick googleing).
Jul 18th 2008, 07:36 AM
Thank you very much for your swift reply.
My reason for putting the question is primarily a teaching or explanatory one, not one of computation.
Yes I Googled Birthday coincidence too, and noticed everyone dived in straight away looking at non-coincidence.
The problem is when you are trying to explain why the probability of coincidence is much higher than you might think at first sight, to people who may not be too strong at maths, or may have forgotten stuff. To go straight to the complementary case without saying why is going to be puzzling.
As you helpfully point out (which is what I had suspected), to approach the problem head on is complex and that it is simpler to go at it from the complement. Some sort of sentence to this effect will certainly help the reader.
As an explanation, I would still prefer a direct approach, such as:
Select one person in the group of n. The probability that another chosen person has the same birthday is 1/365. The number of possible pairings of people in the group is
n/2 (n-1). Therefore the probability that there is a coincidence among the group is
1/365 + 1/365 + .... for each pairing, total prob = n/2 (n-1) /365.
I know this is flawed but is there some first approximation approach that would be easy to follow, and preferably sound?
I have another question in a similar vein concerning coincidence in a derangement, and the formula containing e. It has been a while since I looked at it, so I will take a day or to refresh, and submit presumably under stats. I would be grateful for help there, as well.
Jul 18th 2008, 11:32 AM
Well, I don't think it is puzzling, but probability was my field of study. If I were teaching it to somebody I would explain that there are a lot of different ways for the event "at least 2 people have the same birthday" to occur. But the complimentary event "no two people have the same birthday" is much easier to describe.
If there was an easier way to get the probability I would agree that the constructive approach, rather than the complimentary, might be better, but when you actually look at the constructive approach you'll find that it is actually much harder.
I think complimentary approaches are a necessary part of doing probability and it can make computations significantly easier. It is a standard thing to think about. If I need to calculate a probability of an event, I look to see if the compliment of that event is easier to calculate.
I can think of 2 ways to approximate it. The first is the plain old constructive approach, but only try to calculate the most likely parts of the event. For example if n is small, mostly you should just have some number of pairs of data (not many 3-of-a-kinds). Further, most of the probability of the event will be in the lower number of pairs. So now, this gets rid of much of the nastiness, but as you'll see, its not a bucket of roses either.
Start with the probability for 1 pair. This is actually not necessarily all that easy.
Let's count by saying the first and the second have the same birthday. And the rest are distinct. And then multiply by to factor in the fact that any 2 of these people could have the same birthday.
Now lets do 2 pairs. This is harder. First count how many days could have the 2 pair. (order doesn't matter}
Given that, how many ways are there of assigning the 2 pairs to the set of people? (again, order per birthday doesn't matter)
Given that, how many ways are there of filling out the rest of the birthdays? (here order matters)
Ok so the probability is:
Let's go for 3 pairs! This last way of breaking it down I think will work in general:
Do you see my point? It may look more straightforward to construct. But I don't think it is.
The only other approach that you could take could give you intuition into the problem, but it is not really the right thing. It would work if medians and means were the same thing. It requires expected values of dependent random variables and I do have to make use of complimentary events again.
Let be the number of people born on day i (1-365). Note that has a binomial distribution with probability of success 1/365.
Now let be 1 if and 0 otherwise. Since has a binomial distribution, we can calculate
Then you can ask what is the expected value of
Note that the are very dependent, but that doesn't change the fact that the expectation of the sum is the sum of the expectations.
Then you can plot this as a function of n and see where it goes above 1. It turns out, if I did everything right, that it goes above 1 at n=19. This is giving you the flavor of the issue but it is not the same thing as saying for what n does the probability of having 2 or more people born on the same day go above 50%. You could ask what is but this is hard because of the dependence of the 's and so we're back to where we started(Crying).
Jul 18th 2008, 11:47 AM
By the way, I would like to give a big shout out to Awkward that showed the "bus stop" technique (using expectations of dependent RV's). (Hi)
It just goes to show you that an old dog can learn new tricks. (Rofl)
Ahem. I heard that! Congratulations!(Hi)
Anyway, here is another way to approximate the birthday probabilities which I think can be found in Ross, "A First Course in Probability", although I'm not sure because I don't have my copy here.
Suppose we have n people in the room, so there are pairs. For any given pair, the probability that they have the same birthday is 1/365, so the mean number of matching pairs is . So far everything has been exact and we haven't made any approximation. But now let's say the pairs are "approximately independent"; if so, then we can argue that the total number of matching pairs is approximately Poisson with mean .
Then for we have , and for a Poisson distribution we have , so the approximation turns out to be pretty good.