• May 8th 2011, 01:34 PM
mormor83
hello everyone
this problem is in cryptography but it's a pure Probability problem.

if we have a text of "M" length and we randomly pick 3 number between 1 and M.
we observe the letters in those places.
what is the Probability that we will have two identical letters?

i thought i could calculate it using the birthday paradox..

am i right?
• May 8th 2011, 01:36 PM
SpringFan25
I think you need to make an assumption about the distribution of letters in the text to answer this. if so, what is your assumption?
• May 8th 2011, 01:39 PM
mormor83
the distribution of letters in the text is known.
I assume that the language is English
• May 8th 2011, 01:46 PM
SpringFan25
by all means, dont keep the distribution to yourself!

For arguments sake, suppose all letters are independant and each letter is equally likely to occur in all positions. This is exactly the problem solved in the link below:

http://en.wikipedia.org/wiki/Birthda...he_probability
(except there are 26 possibilities instead of 365)

In that case i would say the answer to your original question is "yes".
• May 8th 2011, 02:00 PM
mormor83

Frequency Table
• May 8th 2011, 02:15 PM
SpringFan25
First, since your sample is 3 letters, at most 1 letter can occur twice.

This means that your probability can be rewritten:
P(Something occurs twice) =
P(a occurs twice)
+P(b occurs twice)
+P(c occurs twice)
...
+P(Z occurs twice)

Provided the letters in each position of the text are i.i.d according the that frequency table, and your sample is random & independant: each of the above terms is easy to calculate; although 26 computations will be needed, so i may be missing a simpler method!

Dont forget at the end to note that:
p() = 1 if M < 3.
• May 8th 2011, 02:40 PM
mormor83
so using frequencies of letters in English calculation will be kind of a long way, isn't it?
• May 8th 2011, 07:35 PM
CaptainBlack
Quote:

Originally Posted by mormor83
so using frequencies of letters in English calculation will be kind of a long way, isn't it?

You could just estimate the probability from a simple Monte-Carlo simulation.

CB