Probability with randomly generated numbers

• Jun 8th 2010, 06:59 PM
nickf77
Probability with randomly generated numbers
This question has me completely stumped. I've run a little program I wrote to find out that the answer averages to 520, but I have absolutely no idea how to figure it out.

A hat has 100 pieces of paper numbered 1 to 100. If you randomly pick pieces of paper out of the hat, write down the number, then return the paper to the hat, how many picks (assuming the world has perfect probability) would it take until you have written down every number from 1 to 100?

I'm being driven nuts. (Speechless)
• Jun 8th 2010, 07:08 PM
TKHunny
It is not a good question. Theoretically, you could draw the same value each time and you certainly will not ever get them ALL. You'll have to rephrase.
• Jun 8th 2010, 08:52 PM
undefined
Quote:

Originally Posted by TKHunny
It is not a good question. Theoretically, you could draw the same value each time and you certainly will not ever get them ALL. You'll have to rephrase.

I think the OP is asking what the expected number of draws is before reaching the stated goal, which is a perfectly reasonable question.

I was initially perplexed as to how to deal with the $2^{100}$ states in this problem, until I realised that the problem can be viewed as only having 101 states. (Giggle)

So, there are two approaches to this problem that I know of. One is to use Markov chains, and an exact answer can be obtained using matrix inversion. I have limited experience with this. If you are interested, there is a PDF available from Dartmouth that I've read part of and found good.

The other approach I have plenty of experience with, and is as follows: We are after an expected value. Let X be the random variable: the number of draws before all numbers are written. Then

$E(X) = \sum_{i=1}^\infty i\cdot P(X=i) = 1\cdot P(X=1) + 2\cdot P(X=2) + \cdots$

This is an infinite sum, but eventually the terms will get small, allowing us to get several decimal places of precision if we use floating point arithmetic.

The procedure is to use an array of size 101 indicating the probability that n numbers have been written, n in {0,1,...,100} for some number of steps, starting with 0 steps and then going to maybe 5000 to get several decimal places of precision. I started typing out an explanation but it's probably easier for me just to type out a program and show you.

Give me a couple hours and I'll post again.
• Jun 8th 2010, 09:50 PM
undefined
Here's the program:

Code:

public class NumberHat100ExpVal {         public static final int L=4000;         public static void main(String[] args) {         long time=System.currentTimeMillis();         solve(100);         System.out.println("Elapsed: "+(System.currentTimeMillis()-time)/1000.0);     }         public static void solve(int n) {         int i, j;         double a[]=new double[n+1], b[], c[]=new double[n+1], d[]=new double[n+1], e=0;         a[0]=1;         for(i=0; i<=n; i++) {             c[i]=i/100.0;             d[i]=1-c[i];         }         for(i=0; i<L; i++) {             b=new double[n+1];             for(j=0; j<n; j++)                 if(a[j]!=0) {                     b[j]+=c[j]*a[j];                     b[j+1]+=d[j]*a[j];                 }             a=b;             if(i>=n) e+=i*a[n];         }         System.out.println(e);     } }
And here is the output:

Code:

517.7377517639593 Elapsed: 0.038
This matches your estimate, so I'm fairly confident that it's bug-free. I can explain how it works, but I figure there's a nonzero probability you will be able to read and understand my code, so I'll just wait for you to ask.
• Jun 9th 2010, 02:46 PM
nickf77
I'm not sure what language your code is in, but I did a very similar thing with a couple lines of Visual Basic - that's where I got that "magic number" 520 from. I'll look into that equation you spoke of.
• Jun 9th 2010, 02:51 PM
undefined
Quote:

Originally Posted by nickf77
I'm not sure what language your code is in, but I did a very similar thing with a couple lines of Visual Basic - that's where I got that "magic number" 520 from. I'll look into that equation you spoke of.

It's Java, sorry I forgot to mention that.

Are you sure you did something similar? Because what I did is significantly different from Monte Carlo. I do not use a random number generator in my code. I forgot to put in units of time, but the output "Elapsed: 0.038" means that the code completes in 38 milliseconds.

Again, let me know if you'd like explanation.

EDIT: I think I uncovered a bug. It seems lines 28 and 29 were switched from what they should be. Here is the new code.

Code:

public class NumberHat100ExpVal {         public static final int L=4000;         public static void main(String[] args) {         long time=System.currentTimeMillis();         solve(100);         System.out.println("Elapsed: "+(System.currentTimeMillis()-time)/1000.0+" seconds");     }         public static void solve(int n) {         int i, j;         double a[]=new double[n+1], b[], c[]=new double[n+1], d[]=new double[n+1], e=0;         a[0]=1;         for(i=0; i<=n; i++) {             c[i]=i/100.0;             d[i]=1-c[i];         }         for(i=0; i<L; i++) {             b=new double[n+1];             for(j=0; j<n; j++)                 if(a[j]!=0) {                     b[j]+=c[j]*a[j];                     b[j+1]+=d[j]*a[j];                 }             if(i>=n) e+=i*a[n];             a=b;         }         System.out.println(e);     } }
Output:
Code:

518.7377517639586 Elapsed: 0.036 seconds
I guess I'll code a Monte Carlo to try to ensure no other bugs.
• Jun 9th 2010, 02:57 PM
awkward
This is the "Coupon Collector's Problem" with n=100. See

Coupon collector's problem - Wikipedia, the free encyclopedia

Using their approximate formula for the expected number of trials T required,

$E(T) \approx n \ln n + \gamma n + 1/2$
where $\gamma \approx 0.5772$ (the "Euler gamma constant")
we find
$E(T) \approx 518.7$
• Jun 9th 2010, 03:20 PM
undefined
Quote:

Originally Posted by awkward
This is the "Coupon Collector's Problem" with n=100. See

Coupon collector's problem - Wikipedia, the free encyclopedia

Using their approximate formula for the expected number of trials T required,

$E(T) \approx n \ln n + \gamma n + 1/2$
where $\gamma \approx 0.5772$ (the "Euler gamma constant")
we find
$E(T) \approx 518.7$

Thanks, that gives me confidence that there are no other bugs in my most recent code and saves me from coding a Monte Carlo.
• Jun 9th 2010, 03:22 PM
SpringFan25
Do the problem one step at a time:

Find the expected tries to get your first ball
Then the number of expected extratries to get the second ball
Then the number of expected extratries to get the third ball

You can do this for the system because it has the markov property.

A complete proceedure for finding the expected number of tries is as follows:

Step 1:
Find f(k), the expected number of tries between matching the (k-1)th ball and the kth ball.

Step 2:
The expected number of steps in total is then

f(1)+f(2)+f(3)+f(4)+f(5)...+f(100)

I'll get you started on finding f(k)
f(1) = 1 (obviously)

for k>1, we will have to do an infinite sum to find our expectation. You should be able to see that the chance of drawing a new ball is:
P(new ball) = (100-(k-1))/100

Define this as a

P(1 step) = a
P(2 steps) = (1-a)(a)
P(3 steps) = (1-a)(1-a)(a)
$P(d steps) = (1-a)^{d-1}a$

Using the definition of expectation, the expected number of steps is:
$f(k)=E(steps) = 1a + 2(1-a)a + 3(1-a)^2a + 4....$
$f(k)= a\left(1+ 2(1-a) +3(1-a)^2 + 4.... \right)$

This is a hypergeometric progression whch you can sum using the standard methods:

$f(k)= a\left(1+ 2(1-a) +3(1-a)^2 + 4.... \right)$

$(1-a)f(k) = a \left((1-a) + 2(1-a)^2 +3(1-a)^3 .... \right)$

(take difference between the above 2 equations)

$f(k)(1-(1-a)) = a \left(1 + (1-a) + (1-a)^2 + (1-a)^3...... \right)$
$a f(k) = \frac{a}{1-(1-a)}$
$f(k)=\frac{1}{a}$ (this should not be surprising!!)

$f(k) = \frac{100}{(100-(k-1))}$

edit:

Now, its up to you to find the value of f(1) + f(2) +f(3)+...f(100). I dont remember if there is a series expansion for that, but there are only 100 terms so you can always put it in a spreadsheet if you get stuck.

Spoiler: