
Sequence Probability
Hi,
I would like to compute the probability of the following problem...
Variables:
+ a sequence defined by an alphabet of four characters
+ a specific short sequence s1
+ a specific short sequence s2
+ a gap of nonspecific characters between s1 and s2 of length g
For example:
...230121032032013220312300132010...
+ s1 = 3203 (the first bold sequence)
+ s2 = 3123 (the second bold sequence)
+ g = 7 (the length of the underlined sequence = gap)
What is the probability that s1 and s2 occur with a gap that does not exceed 100?
Someone suggested using the Poisson distribution, but I'm not sure...

As in the example, if s1 or s2 has length n, then the probability of a randomly generated s1 is 4^n correct? I'm just not sure how to take into account the gap of characters between s1 and s2...

Something's unclear in your problem: what sequence do you consider? Is it finite or infinite? What length? Do you consider the first occurence of s1 (followed by a gap, and then s2), or just any one? or all?
Like, if there is s1, a large gap, then s2, then s1, a small gap and finally s2, do you consider it is ok?

The random sequence of characters is finite and of a given length. For example, the sequence may be 3,000,000 characters long.
Let an occurrence be defined as ...s1 then gap then s2... where 0 <= gap length <= 100. So you are quite correct in your definition.
Alternatively:
+ Let the string of characters S = s1 + gap + s2.
Given:
+ s1 = 3202
+ s2 = 3123
+ gap = any sequence of characters (0 <= length of gap <= 100)
+ then S = s1 + gap + s2
+ therefore the following are possible S sequences:
 ...3203031221013120101231323133210123103123...
 ...32032012133101001102330101203123123...
 ...320303123...
 ...32033123...
 ...320302233121013202103123...
Then, what is the probability of an S occurring in a random sequence of n characters?
Thanks for pointing out anything that is unclear in the problem... I'm having a tough time trying to define the problem in the first place...