# Thread: Law of an application

1. ## Law of an application

Hi,

I'm currently reading a paper, and I'm having difficulties in grasping the concept of a tree, and more precisely this :

Let $\mathbb{T}~:~ \Omega \to\Omega$ be the identity application.
Let $(p_k,k\geq 0)$ be a probability.
There exists a probability $\mathbf{P}$ on $\Omega$ such that the law of $\mathbb{T}$ under $\mathbf{P}$ is the law of the tree (with reproduction distribution $p_k$)
there are 3 things I don't understand :
- why would one be interested in the identity application ?
- how would one explain with words what the law of $\mathbb{T}$ is ?
- what does "the law of T under P" mean ? Does it have something related to... I don't know, we call it in French "mesure image", which is $\mu_X$ such that $\mu_X(A)=\mu(X^{-1}(A))=\mu(X\in A)$ ?

Thanks for any input... And if there's something unclear, I'll try to correct the thing up

2. Hi, Moo!

here are a few pieces of answer...

Originally Posted by Moo
- what does "the law of T under P" mean ? Does it have something related to... I don't know, we call it in French "mesure image", which is $\mu_X$ such that $\mu_X(A)=\mu(X^{-1}(A))=\mu(X\in A)$ ?
The law of T under P is just the law of T when the probability space (here, $\Omega$) is endowed with the probability measure P. In other words, this is indeed the image measure of P by T.

- why would one be interested in the identity application ?
There are always two possible viewpoints in any probabilistic setting: either a fixed undefined very large probability space $(\Omega,P)$ on which we define several random variables, this is the usual case in introductory courses; or a space $\Omega$ with several distributions and the identity map as a unique random variable.
The interest in the first case is simplicity and universality (it doesn't matter what base space we use, so why specify); it assumes however that we rely on sometimes not obvious existence theorems.
The second case is useful when one wants to study the same random variable under various distributions. For instance, if we want to vary a parameter (like the parameter of a Bernoulli), we simply introduce a family of probabilities indexed by this parameter, and we say: "under $P_p$, ...." to say that we consider the value p of the parameter. This is a very convenient setting for Markov chains, where the parameter is the starting point. We have one random variable $(X_n)_{n\geq 0}$ which is the identity map on $E^{\mathbb{N}}$, and one measure $P_x$ for every site $x\in E$, which is the law of the Markov chain (with some given transition matrix) starting at $x$. This allows to give meaning to expressions like $P_x(X_2=y)=\sum_z P_x(X_1=z)P_z(X_1=y)$ (i.e. the Markov property).

- how would one explain with words what the law of $\mathbb{T}$ is ?
This is called a Galton-Watson tree. From one ancestor (the root of the tree), we have a random number $Z_1$ of children, each of which itself has random numbers of children, etc., where the numbers of children are all independent of each other. This is a genealogic tree where the numbers of children is random: k children (possibly 0) with probability $p_k$. A basic question is: does the family eventually get extinct? i.e. is the tree finite?

In this case, $\Omega$ would be the set of trees (or a larger set), or an equivalent representation of trees. A convenient choice is to let $U=\cup_n \mathbb{N}^n$, the set of all finite sequences of integers, and $\Omega=\mathbb{N}^U$, the set of positive-integer sequences indexed by elements of $U$. Intuitively, an individual corresponds to a sequence $u=a_1a_2\cdots a_n\in U$ if it is obtained from the root as the $a_n$-th child of the $a_{n-1}$-th child of .... of the $a_1$-th child of the root. And the number of children of this individual is encoded in the index $n_u$ of the tree $T=(n_u)_{u\in U}\in \mathbb{N}^U$ (with $n_u$ arbitrary if $u$ is not connected to the root, thus there are more codings than trees).

With these notations, we can define the law of $\mathbb{T}$ as $\mu^{\otimes U}$ where $\mu$ is the probability distribution $\mu(\{k\})=p_k$.

Actually, what I just did is "prove" your statement, assuming that the existence of an infinite-product measure is trivial, which it is not... This is even probably the reason why this statement is outlined by the author. NB: if the author starts with such a statement, you can expect sharp rigor in the following!

3. here are a few pieces of answer...
These don't look like pieces, they just explain what I had been looking for...

This is called a Galton-Watson tree. From one ancestor (the root of the tree), we have a random number $Z_1$ of children, each of which itself has random numbers of children, etc., where the numbers of children are all independent of each other. This is a genealogic tree where the numbers of children is random: k children (possibly 0) with probability $p_k$. A basic question is: does the family eventually get extinct? i.e. is the tree finite?

In this case, $\Omega$ would be the set of trees (or a larger set), or an equivalent representation of trees. A convenient choice is to let $U=\cup_n \mathbb{N}^n$, the set of all finite sequences of integers, and $\Omega=\mathbb{N}^U$, the set of positive-integer sequences indexed by elements of $U$. Intuitively, an individual corresponds to a sequence $u=a_1a_2\cdots a_n\in U$ if it is obtained from the root as the $a_n$-th child of the $a_{n-1}$-th child of .... of the $a_1$-th child of the root. And the number of children of this individual is encoded in the index $n_u$ of the tree $T=(n_u)_{u\in U}\in \mathbb{N}^U$ (with $n_u$ arbitrary if $u$ is not connected to the root, thus there are more codings than trees).
Well, you got it absolutely right... I should've mentioned it, it would've spared you from too much writing, sorry
It's a paper about GW trees, and it's defined almost the same way as you did... (a tree is defined to always contain the root element)

Actually, what I just did is "prove" your statement, assuming that the existence of an infinite-product measure is trivial, which it is not... This is even probably the reason why this statement is outlined by the author. NB: if the author starts with such a statement, you can expect sharp rigor in the following!
He doesn't start with this statement, but I think it's quite a rigorous paper...
That's not nice from him to have outlined the statement, it bugged me a lot

The law of T under P is just the law of T when the probability space (here, $\Omega$) is endowed with the probability measure P. In other words, this is indeed the image measure of P by T.
Okay, I feel better now that I know that

There are always two possible viewpoints in any probabilistic setting: either a fixed undefined very large probability space $(\Omega,P)$ on which we define several random variables, this is the usual case in introductory courses; or a space $\Omega$ with several distributions and the identity map as a unique random variable.
The interest in the first case is simplicity and universality (it doesn't matter what base space we use, so why specify); it assumes however that we rely on sometimes not obvious existence theorems.
The second case is useful when one wants to study the same random variable under various distributions. For instance, if we want to vary a parameter (like the parameter of a Bernoulli), we simply introduce a family of probabilities indexed by this parameter, and we say: "under $P_p$, ...." to say that we consider the value p of the parameter.
Oh yeah, actually we work with that kind of things in statistics, defining the probability space $(\mathbb{R}^k,\mathcal{B}_{\mathbb{R}^k},P_\theta) _{\{\theta\in\Theta\}}$

This is a very convenient setting for Markov chains, where the parameter is the starting point. We have one random variable $(X_n)_{n\geq 0}$ which is the identity map on $E^{\mathbb{N}}$, and one measure $P_x$ for every site $x\in E$, which is the law of the Markov chain (with some given transition matrix) starting at $x$. This allows to give meaning to expressions like $P_x(X_2=y)=\sum_z P_x(X_1=z)P_z(X_1=y)$ (i.e. the Markov property).
But in this example of the Markov property, there isn't a unique random variable, is there ?

Hmm I think it's likely that there will be more questions, but not because your excellent explanations weren't sufficient, it's just that I may need to read this and to be explained this several times, in different ways =)

Thanks, as always...

4. Originally Posted by Moo
But in this example of the Markov property, there isn't a unique random variable, is there ?
There is, considering $(X_n)_{n\geq 0}$ as a unique random variable.

Another common way to write the Markov property, without using the trick of the identity map (i.e. in the "first viewpoint" fashion, cf. previous post), is to take $X$ to be a Markov chain starting at $x$ and write $P(X_2=z)=\sum_y P(X_1=y)P(X_2=z|X_1=y)$. The problem is that you actually would have to restrict the sum to the values $y$ such that $P(X_1=y)>0$ in order to write the conditioning ; that complicates the notation for nothing.

Bye,
Laurent.

5. Originally Posted by Laurent
There is, considering $(X_n)_{n\geq 0}$ as a unique random variable.
Oh yeah, so it's just a matter of definition

The problem is that you actually would have to restrict the sum to the values y such that P(X_1=y)>0 in order to write the conditioning
I recall already having this problem... lol