# Thread: Probabilty of drawing z distinct values in sample of size k

1. ## Probabilty of drawing z distinct values in sample of size k

What is the probability of drawing a sample of size k
-with replacement
-from a set containing N distinct values (and no other values). Each of these N values has an equal probability 1/N of being drawn
-containing only z distinct values (where z <=k and z<=N)

Suppose for instance that the set is given by S={A,B,C} and k=2.
z=1: probability of drawing
( {A,A},{B,B} or {C,C} )
-->(1 distinct value)
prob = (3/9)

z=2: probability of drawing
({A,B} ,{B,A},{A,C},{C,A} ,{B,C} or {C,B})
-->(2 distinct values)
prob = (6/9)

Alternative example:

Suppose for instance that the set is given by S={A,B,C} and k=3.

z=1: probability of drawing
( {A,A,A},{B,B,B} or {C,C,C} )
-->(1 distinct value)
prob = (3/27)

z=2: probability of drawing
{A,B,B}, {B,A,B},{B,B,A}, {B,A,A}, {A,B,A},{A,A,B},
{A,C,C}, {C,A,C},{C,C,A}, {C,A,A}, {A,C,A},{A,A,C},
{C,B,B}, {B,C,B},{B,B,C}, {B,C,C}, {C,B,C}, or{C,C,B}
(2 distinct values)
prob = (18/27)

z=3: probability of drawing
({A,B,C}, {A,C,B}, {B,A,C} {B,C,A}, {C,A,B}, or {C,B,A})
-->(3 distinct values)
prob = (6/27)

2. ## Re: Probabilty of drawing z distinct values in sample of size k

My own progress:

The total number of possible outcomes is N to the power of k. To find the number of favorable outcomes I think you should consider permutations on a multiset. The number of permutations on a multiset is given by the multinomial coefficient:

k! / (z_1! z_2! ... z_i! ... z_z!),

where k is the length of the multiset (and therefore k! is the number of permutations if there were no duplicates) and z_i is the number of times the i-th distinct element occurs.

Using this multinomial coefficient, however, requires figuring out the number of duplicates to plug into the multinomial coefficient (i.e. the z_i's) . You know that if there are z distinct values you need to draw and if you have to draw k values that k-z values will have to be duplicates. However, many different cases arise. For instance one of the z values might have all the duplicates or the duplicates might be spread more equally among the z values. Taking this into account is hard.