Recency Rank Encoding Probability

I'm working on a problem involving recency rank encoding in which I must find some probabilities that I'm having issues calculating.

Given a source alphabet $\displaystyle S = \{s_{1}, ..., s_{m}\}$ with zeroth order relative frequencies $\displaystyle f = \{f_{1}, ..., f_{m}\}$, I need to find the probabilities $\displaystyle p_{0}, ..., p_{m-1}$ where $\displaystyle p_{i}$ is the probability that between the occurence of a particular source letter and the previous occurence of that same source letter, there are i different source letters (i.e. the last $\displaystyle s_{3}$ in the source texts $\displaystyle s_{3}s_{1}s_{2}s_{3}$ and $\displaystyle s_{3}s_{2}s_{1}s_{2}s_{3}$ would be encoded with a codeword $\displaystyle u_{2}$ in both cases since there are only two *different* source letters between the occurences of $\displaystyle s_{3}$). I am trying to find formulas for each of these $\displaystyle p_{i}$, especially $\displaystyle p_{0},\ p_{1},\ p_{2}$. Could anyone give me some tips on how to find these probabilities?

Re: Recency Rank Encoding Probability

Hey schreckenstat.

If you can convert your model to a Markov model (which it looks like you could) then what you can do is setup the matrix and use that to get the stationary distribution.

Have you come across Markovian modelling before?

Re: Recency Rank Encoding Probability

No, I've heard of it, but never done any myself. However, there's no huge rush for this problem, so if you think that's the best way to solve this I'll give it a shot. Could you please recommend some good references for Markov models? Thanks

Re: Recency Rank Encoding Probability

Re: Recency Rank Encoding Probability

Ok, thanks a lot for the advice. I'll try using your recommended method and see how it goes.

Re: Recency Rank Encoding Probability

Well I've read up on Markov models and I'm having trouble figuring out how to set up the matrix for this sort of generalized problem. Could someone point me in the right direction for setting this up?

Re: Recency Rank Encoding Probability

List all your first order conditional probabilities.

If you have instances where they need to be aggregated (i.e. P(Xn=s1|Xn-1=s1 and Xn-2=s2 and Xn-3 = s4) then you basically exhaust the state space until you get everything that is first order conditional.

Once you have listed all the first order conditional probabilities, then populate the matrix and you will have your Markovian system.

Without any further assumptions, we can't help you fill in values for the probabilities.

Re: Recency Rank Encoding Probability

Well these are "pure zeroth order" frequencies I am working with, so we're basically assuming (for this problem) that the probability of one letter occurring does not depend on the context.

With that in mind, could I still set up a Markovian system?

Re: Recency Rank Encoding Probability

Markovian systems are for first order conditional probabilities. If everything is independent then there is no point using a Markovian system.

I figured that because you are using a linguistic type analysis where order is important (and affect probabilities) that a Markovian approach would be appropriate. For example is s1s2s1s3 is different from say s1s1s2s4 then this is where Markovian analyses are useful (and often used).