As in the example, if s1 or s2 has length n, then the probability of a randomly generated s1 is 4^-n correct? I'm just not sure how to take into account the gap of characters between s1 and s2...
Hi,
I would like to compute the probability of the following problem...
Variables:
+ a sequence defined by an alphabet of four characters
+ a specific short sequence s1
+ a specific short sequence s2
+ a gap of non-specific characters between s1 and s2 of length g
For example:
...230121032032013220312300132010...
+ s1 = 3203 (the first bold sequence)
+ s2 = 3123 (the second bold sequence)
+ g = 7 (the length of the underlined sequence = gap)
What is the probability that s1 and s2 occur with a gap that does not exceed 100?
Someone suggested using the Poisson distribution, but I'm not sure...
Something's unclear in your problem: what sequence do you consider? Is it finite or infinite? What length? Do you consider the first occurence of s1 (followed by a gap, and then s2), or just any one? or all?
Like, if there is s1, a large gap, then s2, then s1, a small gap and finally s2, do you consider it is ok?
The random sequence of characters is finite and of a given length. For example, the sequence may be 3,000,000 characters long.
Let an occurrence be defined as ...s1 then gap then s2... where 0 <= gap length <= 100. So you are quite correct in your definition.
Alternatively:
+ Let the string of characters S = s1 + gap + s2.
Given:
+ s1 = 3202
+ s2 = 3123
+ gap = any sequence of characters (0 <= length of gap <= 100)
+ then S = s1 + gap + s2
+ therefore the following are possible S sequences:
- ...3203031221013120101231323133210123103123...
- ...32032012133101001102330101203123123...
- ...320303123...
- ...32033123...
- ...320302233121013202103123...
Then, what is the probability of an S occurring in a random sequence of n characters?
Thanks for pointing out anything that is unclear in the problem... I'm having a tough time trying to define the problem in the first place...