# Thread: Expected value of a particular substring in a string

1. ## Expected value of a particular substring in a string

I have a random string X of 100 letters in which each letter is equally likely to be any 1 of the 26 in the alphabet. What is the expected value of the number of times the substring 'ABCD' occurs in X? Also, is the expected value the same for every substring of length 4?

Also, what is the expected value for the substring 'AAAA'? Since it can overlap, would the expected value be greater?

2. The probability of A is $\frac{1}{26}$, the probability of a B is $\frac{1}{26}$, the probability of a C is $\frac{1}{26}$, and the probability of a D is $\frac{1}{26}$. Therefore, the probability of ABCD is $\frac{1}{26} \times \frac{1}{26} \times \frac{1}{26} \times \frac{1}{26} = (\frac{1}{26})^4 = \frac{1^4}{26^4} = \frac{1}{456,967}$. If ABCD occurs 1 time out of 456,967 times, then out of 100 times it should occur 0 times.

No, the expected value of any substring of length 4 is constant. For example, the expected value of the substring "AAAA" is greater than the expected value of the substring "ABCD".

The probability of the substring AAAA occuring starting at the first character is $\frac{1}{456,967}$. The probability of AAAA occuring starting at the second character is $\frac{1}{456,967}$. The position of the last character at which AAAA can occur is 97 (97th character = A, 98th character = A, 99th character = A, and 100th character = A). Therefore, the probability of AAAA occuring anywhere in the string of 100 characters is $97 \times \frac{1}{456,967} = \frac{97}{456,967}$. However, you still can't expect to find AAAA in the string of 100 characters since it rarely occurs out of even 456,967 times.