Results 1 to 7 of 7

Math Help - Limit of Markov Chain

  1. #1
    Junior Member
    Joined
    Nov 2009
    Posts
    57

    Limit of Markov Chain

    Consider an homogeneous, ergodic Markov Chain (\omega_0, \omega_1,\ldots) with a finite state space X=\{1,\ldots,r\}, the transition matrix P and the stationnary distribution \pi.
    Given that \pi is also the initial distribution, what is the limit:

    \lim_{n\to \infty} \frac{\ln \mathbb{P}(\omega_i \neq 1,\ 0\leq i \leq n)}{n}

    One approach is to use Renewal Theory and consider probabilities that \{\omega_i =1\} for the first time. However, my application of this approach gives the same result as if each step of the chain was completely independent from the previous one, hence any past one. Could this be right?

    Thanks for your help.
    Follow Math Help Forum on Facebook and Google+

  2. #2
    MHF Contributor

    Joined
    Aug 2008
    From
    Paris, France
    Posts
    1,174
    Hi,
    I would be curious to know how you're applying the renewal theorem here. For me, the renewal theorem tells that the "excursions" outside a given state are i.i.d. (this is just Markov property actually), thus if H^n_x is the time of n-th visit of state x (with H^1_x\geq 1) we have P_x(\omega_i\neq  1, 0\leq i\leq H^n_x)=P_x(\omega_i\neq 1, 0\leq i\leq H_x)^n. Since E[H^n_x]=nE[H_x]=\frac{n}{\pi(x)}, I would conjecture that the limit is \max_x \pi(x)\log P_x(\omega_i\neq  1, 0\leq i\leq H_x), but I have no formal proof. The max comes from the usual (\lambda_1(a_1)^n+\cdots+\lambda_n(a_k)^n)^{1/n}\to_n \max (a_1,\ldots,a_n) (if \lambda_i>0), and P_\pi(\cdot)=\sum_x \pi(x)P_x(\cdot). The step from conjecture to formal proof would go through replacing n by the renewal time H^k_x that is closest to n.

    Another approach would be using matrices. Indeed, one can read the quantity P_\pi(\omega\neq 1,0\leq i\leq n) on a matrix product... Denote by \widetilde{P} the matrix obtained from P by putting 0's on the first line and first column: \widetilde{P}_{1x}=\widetilde{P}_{x1}=0 for any x. Then P_\pi(\omega_i\neq 1,0\leq i\leq n)=\sum_x (\pi\widetilde{P}^n)_x. Indeed, the probability of paths that don't go through state 1 can be computed with \widetilde{P} with no change, and paths that go through state 1 lead to a term equal to 0 in the matrix product. It would also be possible to put 0's only on the first line and consider \sum_x (\pi\widetilde{P}^{n+1})_x (paths of length n+1 that don't exit from x at any time correspond to paths of length n that don't visit x at any time).
    Then \left(\sum_x (\pi \widetilde{P}^n)_x\right)^{1/n}\sim  \left(\max_i \sum_j (\widetilde{P}^n)_{ij}\right)^{1/n}=\|\widetilde{P}^n\|^{1/n}=\rho(\widetilde{P}) is the largest eigenvalue of \widetilde{P} (in modulus)... its logarithm would therefore be the answer to the question. This resumes to an algebraic problem... that doesn't seem trivial? (but is numerically simple)
    Follow Math Help Forum on Facebook and Google+

  3. #3
    Junior Member
    Joined
    Nov 2009
    Posts
    57
    My initial idea was also to truncate P in order to generate only the right strings of states. But I had difficulty evaluating the limit, until I remembered the thread where we discussed stationary distributions and the spectral radius of the transition matrix. This is also the approach I used since to solve the problem (this post is a few days old.)

    But this post had more to do with the reason why my application of renewal theory is wrong : I used the approach given by Feller in his book (An Introduction to Probability Theory and Its Applications, Volume 1, 3rd Edition - John Wiley & Sons - 1968), chapter XIII:

    Saying \omega_i \neq 1,\ 0\leq i \leq n is the same as saying that \omega_i = 1 for the first time for i > n. Define the variable \tau = \inf\{ i > 0 : \omega_i = 1\} (first hit to 1, similar to your notation H^1_1) and the probabilities \mathbb{P}(\tau = p)=f_p.

    Now, knowing that u_p = \mathbb{P}(\omega_p = 1) = \pi_1, we have the convolution relation: u_n=f_1u_{n-1}+f_2u_{n-2}+...+f_nu_0. Using generating polynomials one gets the link between f_p and u_p:

    U(s)=\Sigma_{p\geq 0} u_ps^p=\frac{1-(1-\pi_1)s}{1-s} and F(s)=1-\frac{1}{U(s)}=\frac{\pi_1s}{1-(1-\pi_1)s}=\pi_1s(1+(1-\pi_1)s+(1-\pi_1)^2s^2...) which gives a geometric distribution : f_0=0, f_p=\pi_1(1-\pi_1)^{n-1}, p>1.

    This gives us our probability: \mathbb{P}(\omega_i \neq 1,\ 0\leq i \leq n)=\Sigma_{p>n}f_p=1-\Sigma_1^n f_p = 1-\pi_1(1+...+(1-\pi_1)^{n-1})=...= (1-\pi_1)^n

    E.g. the final, without limit and obviously wrong result is ln(1-\pi_1).

    I've been trying to figure out why this is wrong : First, the convention set by Feller u_0 =1 and f_0=0 is at best dubious in this case. At least the " \omega_0" should not be understood here as it is in Feller's book. Second, the application of this approach is similar to the example given in the same book in the case of Bernoulli trials, e.g. an independent sequence of random variables. Here we have a Markov chain, and although it is ergodic, it is likely the wrong transition matrix is used here ( \lim_{n\rightarrow \infty}P^n instead of P).

    Hopefully you understand my confusion at this stage. There is obviously something fundamental that I missed here, which is worrying to say the least. Although I gave up using this approach, I would be extremely grateful if anyone could tell me where is the flaw, and if there is a way to apply it properly.
    I can email the reference in case of need.
    Last edited by akbar; December 30th 2009 at 09:39 AM. Reason: typo
    Follow Math Help Forum on Facebook and Google+

  4. #4
    MHF Contributor

    Joined
    Aug 2008
    From
    Paris, France
    Posts
    1,174
    You almost convinced me...

    Quote Originally Posted by akbar View Post
    \mathbb{P}(\tau = p)=f_p.
    Now, knowing that u_p = \mathbb{P}(\omega_p = 1) = \pi_1, we have the convolution relation: u_n=f_1u_{n-1}+f_2u_{n-2}+...+f_nu_0.
    The mistake is here: where does this convolution come from? It consists in splitting the probability space according to the time when the Markov chain hits 1 for the first time and applying Markov property at this time. Thus, what we really have is:

    P_\pi(X_n=1)=\sum_{k=1}^n P_\pi(\tau=k)P_1(X_{n-k}=1).

    You see the problem?...
    Follow Math Help Forum on Facebook and Google+

  5. #5
    Junior Member
    Joined
    Nov 2009
    Posts
    57
    Took me a little while to get used to your (unorthodox?) notation of conditional probabilities. But it kind of makes sense now.

    My understanding is that P_\pi\equiv\mathbb{P} and
    P_\pi(\cdot)=\sum_x \pi(x)P_x(\cdot) should be understood as \mathbb{P}(.)=\sum_x \pi(x)\mathbb{P}(\cdot|X_0=x),
    P_1(X_{n}=x) as \mathbb{P}(X_{n}=x|X_{n-1}=1). Please let me know if I didn't decipher it properly.

    So the expression:
    P_\pi(X_n=1)=\sum_{k=1}^n P_\pi(\tau=k)P_1(X_{n-k}=1).
    Means (using my notation):
    \mathbb{P}(\omega_{n}=1)=\sum_{k=1}^n\mathbb{P}(\t  au=k)\mathbb{P}(\omega_{n-k}=1|\omega_{k}=1) (1)
    Where (using the hypotesis on the stationary distribution):
    \mathbb{P}(\omega_{n}=1)=\sum_{x=1}^n\pi(x)\mathbb  {P}(\omega_{n}=1|\omega_0=x)=\sum_{x=1}^n\pi(x)\ma  thbb{P}(\omega_{n}=1|\omega_{n-1}=x)=\pi(1)

    If my understanding of equation (1) is correct, I understand now how my interpretation of the convolution relation was wrong, but makes me still wonder how to understand the approach of Feller's book, which is supposed to be applicable to the case of Markov chains: in short, what is the equivalent of u_p and f_p in this problem?

    Thanks for your lights.
    Follow Math Help Forum on Facebook and Google+

  6. #6
    MHF Contributor

    Joined
    Aug 2008
    From
    Paris, France
    Posts
    1,174
    Quote Originally Posted by akbar View Post
    Took me a little while to get used to your (unorthodox?) notation of conditional probabilities. But it kind of makes sense now.

    My understanding is that P_\pi\equiv\mathbb{P} and
    P_\pi(\cdot)=\sum_x \pi(x)P_x(\cdot) should be understood as \mathbb{P}(.)=\sum_x \pi(x)\mathbb{P}(\cdot|X_0=x),
    P_1(X_{n}=x) as \mathbb{P}(X_{n}=x|X_{n-1}=1). Please let me know if I didn't decipher it properly.
    Sorry I didn't define my notation; but I really don't think it's unorthodox since I've been using it over and over and I've seen it used in many places. By the way it is extremely convenient and I couldn't even think of studying Markov chains without it ...

    First of all you can consider the X as a typo in place of \omega. (Markov chains are "always" called X...)

    And the index of the probability in my notation is always the initial state or distribution (well, it is always the initial distribution but we write P_x:=P_{\delta_x} for short when x is a state). In the present situation you can think of it as a conditioned distribution indeed: P_x(\cdot)=\mathbb{P}(\cdot|\omega_0=x) (makes sense since \pi(x)>0). For an intensive use one usually defines P_x to be the distribution of a Markov chain starting at x (with the currently considered transition probabilities), and (X_n)_{n\geq 0} to be the "identity process" on E^{\mathbb{N}} (state space E): (X_n)_{n\geq 0}(\omega):=(\omega_n)_{n\geq 0} (the idea is to have one canonical process and several probability measures). It makes it easier to state and use Markov property.

    Thus the convolution formula is, with your notation (i.e. without further notation):

    \mathbb{P}(\omega_n=1)=\sum_{k=1}^n \mathbb{P}(\tau=k)\mathbb{P}(\omega_{n-k}=1|\omega_0=1).

    (note that I would have been very embarassed to write this equation if you had \omega_0=2 a.s., for instance ; while the meaning is preserved and the version with the identity process and P_x notation would still make sense)

    If my understanding of equation (1) is correct, I understand now how my interpretation of the convolution relation was wrong, but makes me still wonder how to understand the approach of Feller's book, which is supposed to be applicable to the case of Markov chains: in short, what is the equivalent of u_p and f_p in this problem?
    I've just had a look at Feller. If you're considering equation (3.1) and Theorem XIII.3.1 thereafter, the problem you're experiencing is typical with Feller... Even though Feller is a great reference (I own both volumes myself), it is not really easy to use, partly because it dates back to the sixties and introduces lots of notation no one else uses (any more?), partly because you constantly have to look several pages back in order to find the currently active hypotheses. In this case, remember that the "attribute" \mathcal{E} is what he calls a "recurrent event", and if you look up the definition, you can see that " \omega_n=1" is not a recurrent event because condition (b) fails (I should say: you can see that the attribute \mathcal{E} such that the finite sequence of states (E_{j_1},\ldots,E_{j_n}) has the attibute \mathcal{E} iff E_{j_n}=1, is not a recurrent event...). It is a recurrent event only under the probability P_1, not under P_\pi.

    I bet you'll never ever find this definition of "recurrent event" or "attribute" in any other place...

    Have a nice new year's eve; it's time for me to go cooking now!
    Follow Math Help Forum on Facebook and Google+

  7. #7
    Junior Member
    Joined
    Nov 2009
    Posts
    57
    Thanks a lot for the clarification. My difficulty was coming from knowing if the index distribution meant the initial one of the one of the previous state in the chain. I agree the notation simplify things, once you understand it.

    Quote Originally Posted by Laurent View Post
    I've just had a look at Feller. If you're considering equation (3.1) and Theorem XIII.3.1 thereafter, the problem you're experiencing is typical with Feller... Even though Feller is a great reference (I own both volumes myself), it is not really easy to use, partly because it dates back to the sixties and introduces lots of notation no one else uses (any more?), partly because you constantly have to look several pages back in order to find the currently active hypotheses. In this case, remember that the "attribute" \mathcal{E} is what he calls a "recurrent event", and if you look up the definition, you can see that " \omega_n=1" is not a recurrent event because condition (b) fails (I should say: you can see that the attribute \mathcal{E} such that the finite sequence of states (E_{j_1},\ldots,E_{j_n}) has the attibute \mathcal{E} iff E_{j_n}=1, is not a recurrent event...). It is a recurrent event only under the probability P_1, not under P_\pi.
    I guess that settles it. It confirms my feeling uneasy about the initial probabilities, the theory as described in Feller's book can simply not be applied as it is. The way (style, not structure) the book is written makes it an "easier" read than most modern references. But it is probably time to update my library...

    Have a great new year's evening and talk to you next year!
    Follow Math Help Forum on Facebook and Google+

Similar Math Help Forum Discussions

  1. Markov Chain of random variables from a primitive markov chain
    Posted in the Advanced Statistics Forum
    Replies: 0
    Last Post: October 19th 2011, 08:12 AM
  2. Markov chain
    Posted in the Advanced Statistics Forum
    Replies: 8
    Last Post: April 14th 2010, 06:19 AM
  3. Markov Chain
    Posted in the Advanced Statistics Forum
    Replies: 0
    Last Post: February 12th 2010, 07:23 AM
  4. Markov Chain HELP!!!!
    Posted in the Advanced Statistics Forum
    Replies: 1
    Last Post: April 9th 2009, 09:28 PM
  5. Replies: 2
    Last Post: October 28th 2008, 06:32 PM

Search Tags


/mathhelpforum @mathhelpforum