# Math Help - Limit of Markov Chain

1. ## Limit of Markov Chain

Consider an homogeneous, ergodic Markov Chain $(\omega_0, \omega_1,\ldots)$ with a finite state space $X=\{1,\ldots,r\}$, the transition matrix $P$ and the stationnary distribution $\pi$.
Given that $\pi$ is also the initial distribution, what is the limit:

$\lim_{n\to \infty} \frac{\ln \mathbb{P}(\omega_i \neq 1,\ 0\leq i \leq n)}{n}$

One approach is to use Renewal Theory and consider probabilities that $\{\omega_i =1\}$ for the first time. However, my application of this approach gives the same result as if each step of the chain was completely independent from the previous one, hence any past one. Could this be right?

2. Hi,
I would be curious to know how you're applying the renewal theorem here. For me, the renewal theorem tells that the "excursions" outside a given state are i.i.d. (this is just Markov property actually), thus if $H^n_x$ is the time of $n$-th visit of state $x$ (with $H^1_x\geq 1$) we have $P_x(\omega_i\neq 1, 0\leq i\leq H^n_x)=P_x(\omega_i\neq 1, 0\leq i\leq H_x)^n$. Since $E[H^n_x]=nE[H_x]=\frac{n}{\pi(x)}$, I would conjecture that the limit is $\max_x \pi(x)\log P_x(\omega_i\neq 1, 0\leq i\leq H_x)$, but I have no formal proof. The max comes from the usual $(\lambda_1(a_1)^n+\cdots+\lambda_n(a_k)^n)^{1/n}\to_n \max (a_1,\ldots,a_n)$ (if $\lambda_i>0$), and $P_\pi(\cdot)=\sum_x \pi(x)P_x(\cdot)$. The step from conjecture to formal proof would go through replacing $n$ by the renewal time $H^k_x$ that is closest to $n$.

Another approach would be using matrices. Indeed, one can read the quantity $P_\pi(\omega\neq 1,0\leq i\leq n)$ on a matrix product... Denote by $\widetilde{P}$ the matrix obtained from $P$ by putting 0's on the first line and first column: $\widetilde{P}_{1x}=\widetilde{P}_{x1}=0$ for any $x$. Then $P_\pi(\omega_i\neq 1,0\leq i\leq n)=\sum_x (\pi\widetilde{P}^n)_x$. Indeed, the probability of paths that don't go through state 1 can be computed with $\widetilde{P}$ with no change, and paths that go through state 1 lead to a term equal to 0 in the matrix product. It would also be possible to put 0's only on the first line and consider $\sum_x (\pi\widetilde{P}^{n+1})_x$ (paths of length n+1 that don't exit from x at any time correspond to paths of length n that don't visit x at any time).
Then $\left(\sum_x (\pi \widetilde{P}^n)_x\right)^{1/n}\sim$ $\left(\max_i \sum_j (\widetilde{P}^n)_{ij}\right)^{1/n}=\|\widetilde{P}^n\|^{1/n}=\rho(\widetilde{P})$ is the largest eigenvalue of $\widetilde{P}$ (in modulus)... its logarithm would therefore be the answer to the question. This resumes to an algebraic problem... that doesn't seem trivial? (but is numerically simple)

3. My initial idea was also to truncate $P$ in order to generate only the right strings of states. But I had difficulty evaluating the limit, until I remembered the thread where we discussed stationary distributions and the spectral radius of the transition matrix. This is also the approach I used since to solve the problem (this post is a few days old.)

But this post had more to do with the reason why my application of renewal theory is wrong : I used the approach given by Feller in his book (An Introduction to Probability Theory and Its Applications, Volume 1, 3rd Edition - John Wiley & Sons - 1968), chapter XIII:

Saying $\omega_i \neq 1,\ 0\leq i \leq n$ is the same as saying that $\omega_i = 1$ for the first time for $i > n$. Define the variable $\tau = \inf\{ i > 0 : \omega_i = 1\}$ (first hit to 1, similar to your notation $H^1_1$) and the probabilities $\mathbb{P}(\tau = p)=f_p$.

Now, knowing that $u_p = \mathbb{P}(\omega_p = 1) = \pi_1$, we have the convolution relation: $u_n=f_1u_{n-1}+f_2u_{n-2}+...+f_nu_0$. Using generating polynomials one gets the link between $f_p$ and $u_p$:

$U(s)=\Sigma_{p\geq 0} u_ps^p=\frac{1-(1-\pi_1)s}{1-s}$ and $F(s)=1-\frac{1}{U(s)}=\frac{\pi_1s}{1-(1-\pi_1)s}=\pi_1s(1+(1-\pi_1)s+(1-\pi_1)^2s^2...)$ which gives a geometric distribution : $f_0=0$, $f_p=\pi_1(1-\pi_1)^{n-1}, p>1$.

This gives us our probability: $\mathbb{P}(\omega_i \neq 1,\ 0\leq i \leq n)=\Sigma_{p>n}f_p=1-\Sigma_1^n f_p = 1-\pi_1(1+...+(1-\pi_1)^{n-1})=...= (1-\pi_1)^n$

E.g. the final, without limit and obviously wrong result is $ln(1-\pi_1)$.

I've been trying to figure out why this is wrong : First, the convention set by Feller $u_0 =1$ and $f_0=0$ is at best dubious in this case. At least the " $\omega_0$" should not be understood here as it is in Feller's book. Second, the application of this approach is similar to the example given in the same book in the case of Bernoulli trials, e.g. an independent sequence of random variables. Here we have a Markov chain, and although it is ergodic, it is likely the wrong transition matrix is used here ( $\lim_{n\rightarrow \infty}P^n$ instead of $P$).

Hopefully you understand my confusion at this stage. There is obviously something fundamental that I missed here, which is worrying to say the least. Although I gave up using this approach, I would be extremely grateful if anyone could tell me where is the flaw, and if there is a way to apply it properly.
I can email the reference in case of need.

4. You almost convinced me...

Originally Posted by akbar
$\mathbb{P}(\tau = p)=f_p$.
Now, knowing that $u_p = \mathbb{P}(\omega_p = 1) = \pi_1$, we have the convolution relation: $u_n=f_1u_{n-1}+f_2u_{n-2}+...+f_nu_0$.
The mistake is here: where does this convolution come from? It consists in splitting the probability space according to the time when the Markov chain hits 1 for the first time and applying Markov property at this time. Thus, what we really have is:

$P_\pi(X_n=1)=\sum_{k=1}^n P_\pi(\tau=k)P_1(X_{n-k}=1)$.

You see the problem?...

5. Took me a little while to get used to your (unorthodox?) notation of conditional probabilities. But it kind of makes sense now.

My understanding is that $P_\pi\equiv\mathbb{P}$ and
$P_\pi(\cdot)=\sum_x \pi(x)P_x(\cdot)$ should be understood as $\mathbb{P}(.)=\sum_x \pi(x)\mathbb{P}(\cdot|X_0=x)$,
$P_1(X_{n}=x)$ as $\mathbb{P}(X_{n}=x|X_{n-1}=1)$. Please let me know if I didn't decipher it properly.

So the expression:
$P_\pi(X_n=1)=\sum_{k=1}^n P_\pi(\tau=k)P_1(X_{n-k}=1)$.
Means (using my notation):
$\mathbb{P}(\omega_{n}=1)=\sum_{k=1}^n\mathbb{P}(\t au=k)\mathbb{P}(\omega_{n-k}=1|\omega_{k}=1)$ $(1)$
Where (using the hypotesis on the stationary distribution):
$\mathbb{P}(\omega_{n}=1)=\sum_{x=1}^n\pi(x)\mathbb {P}(\omega_{n}=1|\omega_0=x)=\sum_{x=1}^n\pi(x)\ma thbb{P}(\omega_{n}=1|\omega_{n-1}=x)=\pi(1)$

If my understanding of equation $(1)$ is correct, I understand now how my interpretation of the convolution relation was wrong, but makes me still wonder how to understand the approach of Feller's book, which is supposed to be applicable to the case of Markov chains: in short, what is the equivalent of $u_p$ and $f_p$ in this problem?

6. Originally Posted by akbar
Took me a little while to get used to your (unorthodox?) notation of conditional probabilities. But it kind of makes sense now.

My understanding is that $P_\pi\equiv\mathbb{P}$ and
$P_\pi(\cdot)=\sum_x \pi(x)P_x(\cdot)$ should be understood as $\mathbb{P}(.)=\sum_x \pi(x)\mathbb{P}(\cdot|X_0=x)$,
$P_1(X_{n}=x)$ as $\mathbb{P}(X_{n}=x|X_{n-1}=1)$. Please let me know if I didn't decipher it properly.
Sorry I didn't define my notation; but I really don't think it's unorthodox since I've been using it over and over and I've seen it used in many places. By the way it is extremely convenient and I couldn't even think of studying Markov chains without it ...

First of all you can consider the X as a typo in place of $\omega$. (Markov chains are "always" called X...)

And the index of the probability in my notation is always the initial state or distribution (well, it is always the initial distribution but we write $P_x:=P_{\delta_x}$ for short when $x$ is a state). In the present situation you can think of it as a conditioned distribution indeed: $P_x(\cdot)=\mathbb{P}(\cdot|\omega_0=x)$ (makes sense since $\pi(x)>0$). For an intensive use one usually defines $P_x$ to be the distribution of a Markov chain starting at $x$ (with the currently considered transition probabilities), and $(X_n)_{n\geq 0}$ to be the "identity process" on $E^{\mathbb{N}}$ (state space $E$): $(X_n)_{n\geq 0}(\omega):=(\omega_n)_{n\geq 0}$ (the idea is to have one canonical process and several probability measures). It makes it easier to state and use Markov property.

Thus the convolution formula is, with your notation (i.e. without further notation):

$\mathbb{P}(\omega_n=1)=\sum_{k=1}^n \mathbb{P}(\tau=k)\mathbb{P}(\omega_{n-k}=1|\omega_0=1)$.

(note that I would have been very embarassed to write this equation if you had $\omega_0=2$ a.s., for instance ; while the meaning is preserved and the version with the identity process and $P_x$ notation would still make sense)

If my understanding of equation $(1)$ is correct, I understand now how my interpretation of the convolution relation was wrong, but makes me still wonder how to understand the approach of Feller's book, which is supposed to be applicable to the case of Markov chains: in short, what is the equivalent of $u_p$ and $f_p$ in this problem?
I've just had a look at Feller. If you're considering equation (3.1) and Theorem XIII.3.1 thereafter, the problem you're experiencing is typical with Feller... Even though Feller is a great reference (I own both volumes myself), it is not really easy to use, partly because it dates back to the sixties and introduces lots of notation no one else uses (any more?), partly because you constantly have to look several pages back in order to find the currently active hypotheses. In this case, remember that the "attribute" $\mathcal{E}$ is what he calls a "recurrent event", and if you look up the definition, you can see that " $\omega_n=1$" is not a recurrent event because condition (b) fails (I should say: you can see that the attribute $\mathcal{E}$ such that the finite sequence of states $(E_{j_1},\ldots,E_{j_n})$ has the attibute $\mathcal{E}$ iff $E_{j_n}=1$, is not a recurrent event...). It is a recurrent event only under the probability $P_1$, not under $P_\pi$.

I bet you'll never ever find this definition of "recurrent event" or "attribute" in any other place...

Have a nice new year's eve; it's time for me to go cooking now!

7. Thanks a lot for the clarification. My difficulty was coming from knowing if the index distribution meant the initial one of the one of the previous state in the chain. I agree the notation simplify things, once you understand it.

Originally Posted by Laurent
I've just had a look at Feller. If you're considering equation (3.1) and Theorem XIII.3.1 thereafter, the problem you're experiencing is typical with Feller... Even though Feller is a great reference (I own both volumes myself), it is not really easy to use, partly because it dates back to the sixties and introduces lots of notation no one else uses (any more?), partly because you constantly have to look several pages back in order to find the currently active hypotheses. In this case, remember that the "attribute" $\mathcal{E}$ is what he calls a "recurrent event", and if you look up the definition, you can see that " $\omega_n=1$" is not a recurrent event because condition (b) fails (I should say: you can see that the attribute $\mathcal{E}$ such that the finite sequence of states $(E_{j_1},\ldots,E_{j_n})$ has the attibute $\mathcal{E}$ iff $E_{j_n}=1$, is not a recurrent event...). It is a recurrent event only under the probability $P_1$, not under $P_\pi$.
I guess that settles it. It confirms my feeling uneasy about the initial probabilities, the theory as described in Feller's book can simply not be applied as it is. The way (style, not structure) the book is written makes it an "easier" read than most modern references. But it is probably time to update my library...

Have a great new year's evening and talk to you next year!