# Recency Rank Encoding Probability

• Aug 22nd 2013, 08:40 PM
schreckenstat
Recency Rank Encoding Probability
I'm working on a problem involving recency rank encoding in which I must find some probabilities that I'm having issues calculating.

Given a source alphabet \$\displaystyle S = \{s_{1}, ..., s_{m}\}\$ with zeroth order relative frequencies \$\displaystyle f = \{f_{1}, ..., f_{m}\}\$, I need to find the probabilities \$\displaystyle p_{0}, ..., p_{m-1}\$ where \$\displaystyle p_{i}\$ is the probability that between the occurence of a particular source letter and the previous occurence of that same source letter, there are i different source letters (i.e. the last \$\displaystyle s_{3}\$ in the source texts \$\displaystyle s_{3}s_{1}s_{2}s_{3}\$ and \$\displaystyle s_{3}s_{2}s_{1}s_{2}s_{3}\$ would be encoded with a codeword \$\displaystyle u_{2}\$ in both cases since there are only two *different* source letters between the occurences of \$\displaystyle s_{3}\$). I am trying to find formulas for each of these \$\displaystyle p_{i}\$, especially \$\displaystyle p_{0},\ p_{1},\ p_{2}\$. Could anyone give me some tips on how to find these probabilities?
• Aug 23rd 2013, 04:15 AM
chiro
Re: Recency Rank Encoding Probability
Hey schreckenstat.

If you can convert your model to a Markov model (which it looks like you could) then what you can do is setup the matrix and use that to get the stationary distribution.

Have you come across Markovian modelling before?
• Aug 23rd 2013, 03:18 PM
schreckenstat
Re: Recency Rank Encoding Probability
No, I've heard of it, but never done any myself. However, there's no huge rush for this problem, so if you think that's the best way to solve this I'll give it a shot. Could you please recommend some good references for Markov models? Thanks
• Aug 23rd 2013, 06:14 PM
chiro
Re: Recency Rank Encoding Probability
• Aug 23rd 2013, 07:59 PM
schreckenstat
Re: Recency Rank Encoding Probability
Ok, thanks a lot for the advice. I'll try using your recommended method and see how it goes.
• Aug 27th 2013, 05:21 PM
schreckenstat
Re: Recency Rank Encoding Probability
Well I've read up on Markov models and I'm having trouble figuring out how to set up the matrix for this sort of generalized problem. Could someone point me in the right direction for setting this up?
• Aug 27th 2013, 05:47 PM
chiro
Re: Recency Rank Encoding Probability
List all your first order conditional probabilities.

If you have instances where they need to be aggregated (i.e. P(Xn=s1|Xn-1=s1 and Xn-2=s2 and Xn-3 = s4) then you basically exhaust the state space until you get everything that is first order conditional.

Once you have listed all the first order conditional probabilities, then populate the matrix and you will have your Markovian system.

Without any further assumptions, we can't help you fill in values for the probabilities.
• Aug 27th 2013, 06:29 PM
schreckenstat
Re: Recency Rank Encoding Probability
Well these are "pure zeroth order" frequencies I am working with, so we're basically assuming (for this problem) that the probability of one letter occurring does not depend on the context.

With that in mind, could I still set up a Markovian system?
• Aug 27th 2013, 06:52 PM
chiro
Re: Recency Rank Encoding Probability
Markovian systems are for first order conditional probabilities. If everything is independent then there is no point using a Markovian system.

I figured that because you are using a linguistic type analysis where order is important (and affect probabilities) that a Markovian approach would be appropriate. For example is s1s2s1s3 is different from say s1s1s2s4 then this is where Markovian analyses are useful (and often used).