# Thread: Probability of consecutive wins

1. ## Probability of consecutive wins

Hello all,

I have played binomial game and recorded wins and losses. I played 179 rounds and then analyzed the resulting sequence. I recorded the number of times when victories appear in runs. For example, out of 179 rounds, there is one time when there are six wins in a row. Five wins in a row shows up two times. There are three times when there are four wins in a row. Three wins in a row occur five times. There are 11 times when there are two wins in a row. Finally, there are 25 times when there is a one win alone.

I would like to know what statistical law or distribution explains this observation.

Thanks to all.

2. Warning: This comment is intuition (plus a few matlab simulations to verify).

For large p (the probability of 1 success), I'm not sure that this is an easy distribution to describe. At least, I don't see it.

As p get's smaller, I think that it may start to converge to a negative binomial. Suppose p is the probability of 1 success. For large N, the chance of a run of 2 is p times less likely than a run of 1. A run of 3 is p times less likely than a run of 2. Etc. BTW, a run of length 0 is a single failure.

p needs to be small so that you have lots of "trials" (chances for runs) compared to the length of the run lengths.

I should also point out that although I am waving my hands, I did run some simulations using matlab. For p = 0.5 and smaller, my experimental results matched the negative binomial distribution pretty well. I didn't actually try to run anything larger. So 'small' may not actually have to be too small.

3. Originally Posted by ZPlayer
Hello all,

I have played binomial game and recorded wins and losses. I played 179 rounds and then analyzed the resulting sequence. I recorded the number of times when victories appear in runs. For example, out of 179 rounds, there is one time when there are six wins in a row. Five wins in a row shows up two times. There are three times when there are four wins in a row. Three wins in a row occur five times. There are 11 times when there are two wins in a row. Finally, there are 25 times when there is a one win alone.

I would like to know what statistical law or distribution explains this observation.

Thanks to all.
Hi ZPlayer,

Just to be sure of the problem statement, let's say we flip a fair coin N times and count the number of runs of heads of length r, where a run of length r is defined as a sequence of exactly r heads not preceded or followed by a head. It's fairly easy to find the expected number of runs of length r. The actual probability distribution, on the other hand, seems to be a much harder problem and I don't know a formula for it.

So let's work on the expected value problem:

Define
$X_i = 1$ if a run of r heads ends on the ith coin flip, for $r \leq i \leq N$,
$= 0$ otherwise.

A run of r heads ends on flip r if we have r heads followed by a tail, so
$P(X_r = 1) = 2^{-r-1}$
A run of r heads ends on flip N if it is preceded by a tail, so
$P(X_N = 1) = 2^{-r-1}$.
For other values of i, i.e., $r < i < N$, a run of r heads must be preceded and followed by a tail, so
$P(X_i = 1) = 2^{-r-2} \text{ for } r < i < N$.

The total number of runs of length r is $X_r + X_{r+1} + \dots + X_N$, and

$E(X_r + X_{r+1} + \dots + X_N)$
$= E(X_r) + E(X_{r+1}) + \dots + E(X_N)$
$=2^{-r-1} + (N-r-1) 2^{-r-2} + 2^{-r-1}$
$=(3 + N - r) 2^{-r-2}$

Here we have used the theorem that E(X+Y) = E(X) + E(Y). It's important to know that this theorem holds even if X and Y are not independent. That's good for us here, because the $X_i$'s are not independent. (There are probably some people who think this is the only probability theorem I know, because I seem to apply it on every problem. Maybe they're right. )

I think if you work out the numbers you will find they agree fairly well with your data.

4. Originally Posted by awkward
(There are probably some people who think this is the only probability theorem I know, because I seem to apply it on every problem. Maybe they're right. )
Perhaps, awkward, but you do use it to great effect. I thought about that technique but couldn't quite see how to use it.

I thought about this some more and came to the following conclusion. The negative binomial will be an excellent approximation. In order for it to be a good approximation, I think that the requirement would be something like $(1-p)N>100$ (to get absolute errors of both PDF and CDF around 1e-3 or smaller - based on simulations).

The experiment you are performing (almost) consists of some number of negative binomial experiments (NBEs). The number of NBEs depends on the outcomes of the NBEs themselves. This is where the difficulty lies. The number of runs of length 5 is not independent from the number of runs of length 9. For example if N=10, and you know there was 1 run of length 9, then you know there was NOT a run of length 5.

However, if $(1-p)N$ is big, this says that the expected number of failures is quite large, hence the number of NBEs is quite large. When this happens, then the number of runs of length 5 becomes more independent then the number of runs of length 9. Or another way of thinking about it is that the number of NBEs relative to N converges to $(1-p)$.

So in essence you can think of the problem as being $(1-p)N$ independent negative binomial experiments. And thus you are doing a finite sampling of a negative binomial distribution.