this problem is in cryptography but it's a pure Probability problem.
if we have a text of "M" length and we randomly pick 3 number between 1 and M.
we observe the letters in those places.
what is the Probability that we will have two identical letters?
i thought i could calculate it using the birthday paradox..
am i right?
I think you need to make an assumption about the distribution of letters in the text to answer this. if so, what is your assumption?
the distribution of letters in the text is known.
I assume that the language is English
by all means, dont keep the distribution to yourself!
For arguments sake, suppose all letters are independant and each letter is equally likely to occur in all positions. This is exactly the problem solved in the link below:
(except there are 26 possibilities instead of 365)
In that case i would say the answer to your original question is "yes".
here is a link :
First, since your sample is 3 letters, at most 1 letter can occur twice.
This means that your probability can be rewritten:
P(Something occurs twice) =
P(a occurs twice)
+P(b occurs twice)
+P(c occurs twice)
+P(Z occurs twice)
Provided the letters in each position of the text are i.i.d according the that frequency table, and your sample is random & independant: each of the above terms is easy to calculate; although 26 computations will be needed, so i may be missing a simpler method!
Dont forget at the end to note that:
p() = 1 if M < 3.
so using frequencies of letters in English calculation will be kind of a long way, isn't it?
You could just estimate the probability from a simple Monte-Carlo simulation.
Originally Posted by mormor83