Hello,
So I've been trying to figure this out for a few days now and can't make any headway. I want to figure out the probability of a substring X = <x_1, x_2, ... , x_n> occurring k times in a given string Y of length M, given that the probability of a given n-length string being X is p_x. At first I thought this would just be a set of M-k+1 independent Bernoulli trials, which is obviously the case for k=1:
P(X occurs once) = (M-k+1) * (P_x)^1 * P_x ^ (M-k)
But this doesn't generalize to k > 1 because its not valid to assume that the trials can overlap in Y. That is to say, if <y_1 ... y_n> =/= X, and there is no repetition in X, then none of the subsections of Y which begin in y_1 to y_n can possibly be "true". The situation is different again if there IS repetition in X!
So whats the answer? I hope I have explained the problem adequately. Thank you very much!