I need to solve a mathematical problem for an Engineering purpose.
Imagine I have a string of bits of length N.
And I have a bit sequence pattern of length M.
I would like to know what is the expected number of pattern matches in the string. You can assume 0 and 1's are equally likely and there bits are totally uncorrelated.
An approximation would also help.
Hi Mr. Fantastic.
Thank you very much for the help though...
Isn't what you tell me an upper level for ?
If you have in mind for example that a sequence has the pattern 10101010. Once there is a pattern found in the string. The probability of finding the same pattern just after it is 0, and not .
So actually there is some sort of correlation to be concerned about.
The probability of a match at position i is , so
So the expected number of pattern matches is
Here we have used the theorem that E(X+Y) = E(X) + E(Y). It's important to note that this theorem is true even if X and Y are not independent, which is good for us because the 's are not independent. (That's why the sum is not binomially distributed, as you noted previously.)
This is all assuming that the string and pattern consist of independent random variables which are equally likely to be 0 or 1. If you have additional information, e.g. if you know that the pattern is 10101010 (as in your reply to Mr. Fantastic), that's a different problem.
Thanks for your helpful comment.
Actually yes, the sequence patern I was looking at was 10001011 and 01110100 (inverse). That is to say if any of these paterns were
found it would mean there is a match.
After you clarifications I deduce that:
when the conditional probabilites are taken in account. When there is no more probable pattern, the computation of still gives .
But with a specific case this might change, am I right?
If that is it, I know how to compute it now.