# Thread: Expected number of matches

1. ## Expected number of matches

Hi all,

I need to solve a mathematical problem for an Engineering purpose.
Imagine I have a string of bits of length N.
And I have a bit sequence pattern of length M.

I would like to know what is the expected number of pattern matches in the string. You can assume 0 and 1's are equally likely and there bits are totally uncorrelated.

An approximation would also help.

Thanks!

Hi all,

I need to solve a mathematical problem for an Engineering purpose.
Imagine I have a string of bits of length N.
And I have a bit sequence pattern of length M.

I would like to know what is the expected number of pattern matches in the string. You can assume 0 and 1's are equally likely and there bits are totally uncorrelated.

An approximation would also help.

Thanks!
Move the bit sequence pattern along the string. There are n = M - (N - 1) = M - N + 1 positions. Test for a match at each position. The probability of a match at each position is $p = \left( \frac{1}{2} \right)^M$.

Let X be the random variable number of matches. Then X ~ Binomial(n, p).

So $E(X) = np = (M - N + 1) \, \left( \frac{1}{2} \right)^M$.

3. Hi Mr. Fantastic.

Thank you very much for the help though...

Isn't what you tell me an upper level for $E(matches)$ ?

If you have in mind for example that a sequence has the pattern 10101010. Once there is a pattern found in the string. The probability of finding the same pattern just after it is 0, and not $0.5^M$ .

So actually there is some sort of correlation to be concerned about.

Hi all,

I need to solve a mathematical problem for an Engineering purpose.
Imagine I have a string of bits of length N.
And I have a bit sequence pattern of length M.

I would like to know what is the expected number of pattern matches in the string. You can assume 0 and 1's are equally likely and there bits are totally uncorrelated.

An approximation would also help.

Thanks!

Define
$X_i =
\begin{cases}
1 &\text{if the pattern matches the string at position i}\\
0 &\text{otherwise}
\end{cases}$

for $i = 1, 2, \dots ,N-M+1$.

The probability of a match at position i is $2^{-M}$, so
$P(X_i = 1) = 2^{-M}$
and
$E(X_i) = 2^{-M}$.

So the expected number of pattern matches is
$E(\sum_{i=1}^{N-M+1} X_i ) = \sum_{i=1}^{N-M+1} E(X_i) = (N-M+1)\, 2^{-M}$.

Here we have used the theorem that E(X+Y) = E(X) + E(Y). It's important to note that this theorem is true even if X and Y are not independent, which is good for us because the $X_i$'s are not independent. (That's why the sum is not binomially distributed, as you noted previously.)

This is all assuming that the string and pattern consist of independent random variables which are equally likely to be 0 or 1. If you have additional information, e.g. if you know that the pattern is 10101010 (as in your reply to Mr. Fantastic), that's a different problem.

5. Hi awkward,

Actually yes, the sequence patern I was looking at was 10001011 and 01110100 (inverse). That is to say if any of these paterns were
found it would mean there is a match.

After you clarifications I deduce that:
$E(X_i)!=2^-M$ when the conditional probabilites are taken in account. When there is no more probable pattern, the computation of $E(X_i)$ still gives $2^-M$.
But with a specific case this might change, am I right?

If that is it, I know how to compute it now.

Hi awkward,

$E(X_i)!=2^-M$ when the conditional probabilites are taken in account. When there is no more probable pattern, the computation of $E(X_i)$ still gives $2^-M$.
Well, I guess now that I've thought about it, the actual pattern is irrelevant to the final answer, since the probability of a match is still $2^{-M}$.