# Thread: deals of n cards for all cards appear once

1. ## deals of n cards for all cards appear once

I'm looking at a privacy issue and this seems to be a simple analogy...

I have a deck of cards (let's say 52). For each experiment I perform a perfect shuffle and deal (let's say) seven cards face-up and write down the identity and count of the visible cards. I return the cards to the deck and perform another experiment where I shuffle and deal 7 face-up cards and record them. And so on and so on...

a) What is an expression couched in terms of the size of deck (here 52) and size of the "hand" (here 7) for the likely number of experiments before all 52 cards will have been recorded at least once to some level of confidence?

b) If I don't know how many cards are in the deck, how many experiments could I let go by with no new cards being recorded so that I can conclude with some level of confidence that I have seen them all of the cards?

2. Originally Posted by bodgerbob
I'm looking at a privacy issue and this seems to be a simple analogy...

I have a deck of cards (let's say 52). For each experiment I perform a perfect shuffle and deal (let's say) seven cards face-up and write down the identity and count of the visible cards. I return the cards to the deck and perform another experiment where I shuffle and deal 7 face-up cards and record them. And so on and so on...

a) What is an expression couched in terms of the size of deck (here 52) and size of the "hand" (here 7) for the likely number of experiments before all 52 cards will have been recorded at least once to some level of confidence?

b) If I don't know how many cards are in the deck, how many experiments could I let go by with no new cards being recorded so that I can conclude with some level of confidence that I have seen them all of the cards?
I see you've posted this question again in Las Vegas math. The reason no one has responded is this is a difficult, non-standard problem.

I think I have somewhat of a solution for (b) when only one card is dealt per hand. I made a number of errors writing this, so my logic needs checking carefully.

Let N be the number of cards recorded at a point in time. You need a function K(N) giving the number of subsequent deals with no new cards recorded before you conclude there are no new cards. Suppose for now that N* is the actual number of cards in the deck. Then given the function K(N), the probability that you will stop in error at any one N < N* is E(N,N*) = (N/N*)^K(N). The probability that you will not stop before N = N* is the product from N = 1 to N*-1 of (1-E(N,N*)). This is the probability of success; call it P(N*). You must choose the function K(N) so that P(N*) >= C, your confidence level, for all possible deck sizes N* > 1.

Now when N* >= N+1,

E(N,N*) = (N/N*)^K(N) <= (N/(N+1))^K(N) = E(N,N+1). [**]

Let Q(N*) be the product from N = 1 to N*-1 of (1-E(N,N+1)). Since Q(N*) is a decreasing function, the limit Q = lim N* -> infinity Q(N*) exists. Since [**] implies (1-E(N,N*)) >= (1-E(N,N+1)) when N <= N*-1, P(N*) >= Q(N*) > Q for all N* >= 1 and it is sufficient that you choose K(N) so Q >= C. There are an infinite number of functions K(N) that will do this. You have to choose one.

You can simplify the problem by taking logs so the condition is log(Q) >= log(C). log(Q) is sum N = 1 to infinity log(1-E(N,N+1)), which is approximately equal to sum N = 1 to infinity -E(N,N+1). log(C) = log(1-(1-C)) ~= -(1-C). Then K(N) can be chosen so sum N = 1 to infinity (N/(N+1))^K(N) <= 1-C. You could choose K(N) so that ((N/(N+1))^K(N) is term-wise equal to any convergent geometric series with sum <= 1-C. So for any r < 1,

K(N) = ceiling[log(r^(N-1)(1-r)(1-C))/log(N/(N+1))]

will do. The function ceiling[x] is the least integer greater than x; it is used since K(N) must be an integer. This is an approximate solution to Q = C, but should satisfy the original condition that P(N*) >= C for all N* > 1. (I verified that it does using a computer program.)

To choose one function K(N) and say it is optimal, you will at least need to define a loss function and perhaps a prior distribution on the size of the deck.

That's my take. Good luck!

3. Originally Posted by JakeD
I see you've posted this question again in Las Vegas math. The reason no one has responded is this is a difficult, non-standard problem.

I think I have somewhat of a solution for (b) when only one card is dealt per hand. I made a number of errors writing this, so my logic needs checking carefully.

Let N be the number of cards recorded at a point in time. You need a function K(N) giving the number of subsequent deals with no new cards recorded before you conclude there are no new cards. Suppose for now that N* is the actual number of cards in the deck. Then given the function K(N), the probability that you will stop in error at any one N < N* is E(N,N*) = (N/N*)^K(N). The probability that you will not stop before N = N* is the product from N = 1 to N*-1 of (1-E(N,N*)). This is the probability of success; call it P(N*). You must choose the function K(N) so that P(N*) >= C, your confidence level, for all possible deck sizes N* > 1.

Now when N* >= N+1,

E(N,N*) = (N/N*)^K(N) <= (N/(N+1))^K(N) = E(N,N+1). [**]

Let Q(N*) be the product from N = 1 to N*-1 of (1-E(N,N+1)). Since Q(N*) is a decreasing function, the limit Q = lim N* -> infinity Q(N*) exists. Since [**] implies (1-E(N,N*)) >= (1-E(N,N+1)) when N <= N*-1, P(N*) >= Q(N*) > Q for all N* >= 1 and it is sufficient that you choose K(N) so Q >= C. There are an infinite number of functions K(N) that will do this. You have to choose one.

You can simplify the problem by taking logs so the condition is log(Q) >= log(C). log(Q) is sum N = 1 to infinity log(1-E(N,N+1)), which is approximately equal to sum N = 1 to infinity -E(N,N+1). log(C) = log(1-(1-C)) ~= -(1-C). Then K(N) can be chosen so sum N = 1 to infinity (N/(N+1))^K(N) <= 1-C. You could choose K(N) so that ((N/(N+1))^K(N) is term-wise equal to any convergent geometric series with sum <= 1-C. So for any r < 1,

K(N) = ceiling[log(r^(N-1)(1-r)(1-C))/log(N/(N+1))]

will do. The function ceiling[x] is the least integer greater than x; it is used since K(N) must be an integer. This is an approximate solution to Q = C, but should satisfy the original condition that P(N*) >= C for all N* > 1. (I verified that it does using a computer program.)

To choose one function K(N) and say it is optimal, you will at least need to define a loss function and perhaps a prior distribution on the size of the deck.

That's my take. Good luck!
Here is a better function K(N), maybe the best one.

Looking at the results of the computer program for K(N) above , P(N*) is an increasing function with P(*N) > C, so K(N) is increasingly too large as N increases.

The solution is to set K(N) at each stage so that P(N*) = C at every stage (or as close as we can get with integer K(N)). To this end, let R(N) be the product from M = 1 to N-1 of (1-(M/(N+1)^K(M)). Then

P(N+1) = R(N)*(1-(N/(N+1)^K(N)).

We choose integer K(N) so P(N+1) ~= C:

K(N) = ceiling[log(1-C/R(N))/log(N/(N+1))].

This results in a large reduction in K(N). For example, previously with r = .5, K(20) = 346. With r = .99, K(20) = 160. Here, K(20) = 63.