I am writing a code-breaking program, and I'm trying to figure out a decision rule, to teach my program how to recognize real text from gibberish.
I ran into the problem of calculating the probability of encountering a k-letters word in a random n-letters long text.
For example, what is the probability of finding the word "THE" into a 1000 characters long random text. I'd like to generalize and calculate the probability of finding a k-long word m times, etc...
After researching on the net, as far as my understanding goes, I need to use Markov chains, which I never studied... (even though I do study math)
Any help would be great Thank you in advance.