# Thread: Convergence of random variables

1. ## Convergence of random variables

I was reading some proofs about the convergence of random variables, and here are the little bits that I couldn't figure out...

1) Let X_n be a sequence of random variables, and let X_(n_k) be a subsequence of it. If X_n conveges in probability to X, then X_(n_k) also conveges in probability to X. WHY?

2) I was looking at a theorem: if E(Y)<∞, then Y<∞ almost surely. Now I am puzzled by the notation. What does it MEAN to say that Y=∞ or Y<∞?
For example, if Y is a Poisson random variable, then the possible values are 0,1,2,..., (there is no upper bound). Is it true to say that Y=∞ in this case?

3) If (X_n)^4 converges to 0 almost surely, then is it true to say that X_n also converges to 0 almost surely? Why or why not?

4) The moment generating function(mgf) determines the distribution uniquely, so we can use mgf to find the distributions of random varibles. If the mgf already does the job, what is the point of introducing the "characteristic function"?

Any help is much appreciated!

[note: also under discussion in talk stats forum]

2. Originally Posted by kingwinner
1) Let X_n be a sequence of random variables, and let X_(n_k) be a subsequence of it. If X_n conveges in probability to X, then X_(n_k) also conveges in probability to X. WHY?
The same proof works for every convergent sequence. Write out the definition of convergence and it should be clear. (A quick explanation, any element of the subsequence is an element of the sequence, hence if n_k>N then the same stuff for the sequence hold for the subsequence).
2) I was looking at a theorem: if E(Y)<∞, then Y<∞ almost surely. Now I am puzzled by the notation. What does it MEAN to say that Y=∞ or Y<∞?
For example, if Y is a Poisson random variable, then the possible values are 0,1,2,..., (there is no upper bound). Is it true to say that Y=∞ in this case?
$Y<\infty$ means $\mathbb{P}(\{\omega \in \Omega:Y(\omega)<\infty\})=1$.

No, if N is Poisson then N<\infty.
3) If (X_n)^4 converges to 0 almost surely, then is it true to say that X_n also converges to 0 almost surely? Why or why not?
The reason is the same for a convergent sequence. Pick $\epsilon>0$, then there exists an N (a.s) s.t. $X_n^4<\epsilon^4 \implies |X_n|<\epsilon$ a.s. for all n>N.
4) The moment generating function(mgf) determines the distribution uniquely, so we can use mgf to find the distributions of random varibles. If the mgf already does the job, what is the point of introducing the "characteristic function"?
The point is that mgf do not always exist! Characteristic function is a Fourier transform, which exists for almost every process (finite expectation is sufficient and necessary I think). Also Fourier analysis has been studied in depth and has quite a few tools that you can use.

I hope this way clear.

3. 1) I think the following works:
Xn converges to X in probability (by definition) if for all epsilon > 0,
Pr(|Xn-X|>epsilon) -> 0. Suppose Xn converges to X in probability. Let Xnk be a subsequence. Then for any epsilon>0, Pr(|Xnk-X|>epsilon) is a subsequence of Pr(|Xn-X|>epsilon). Since we know that a subsequence of a convergent sequence of numbers converges to the limit of the original sequence, it follows that Pr(|Xnk-X|>epsilon)-> 0. Thus, Xnk converges in probability to X.

2) I don't get it. For a Poisson random variable Y, the possible values are 0,1,2,..., and there is NO upper bound, so Y=∞ is possible? Why did you say that if Y is Poisson, then Y<∞? (same for exponential random variable, there is no upper bound.)
For a binomial random variable X, the possible values are 0,1,2,...,n, there is a upper bound, so Y<∞?
I am really confused. Can someone please explain more on this?

4) So you mean the characterisitic function c(t) always exists for ALL real numbers t, is that right?
Also, for example, if we are asked to prove that the sum of 2 indepndent normal r.v.'s is again normal, then I think the proof using mgf is perfectly fine, but I see my textbook using characteristic function for this, is it absolutely necessary to use characteristic function in a proof like this?

Thanks a lot!

4. Originally Posted by Focus
Characteristic function is a Fourier transform, which exists for almost every process (finite expectation is sufficient and necessary I think).
Since $|e^{iX}|=1$, the expectation $E[e^{iX}]$ makes sense for any random variable $X$ (integration of a bounded measurable function). Thus the characteristic functions of probability measures always exist!

Originally Posted by kingwinner
1) I think the following works:
Xn converges to X in probability (by definition) if for all epsilon > 0,
Pr(|Xn-X|>epsilon) -> 0. Suppose Xn converges to X in probability. Let Xnk be a subsequence. Then for any epsilon>0, Pr(|Xnk-X|>epsilon) is a subsequence of Pr(|Xn-X|>epsilon). Since we know that a subsequence of a convergent sequence of numbers converges to the limit of the original sequence, it follows that Pr(|Xnk-X|>epsilon)-> 0. Thus, Xnk converges in probability to X.
Yes, that's it.
2) I don't get it. For a Poisson random variable Y, the possible values are 0,1,2,..., and there is NO upper bound, so Y=∞ is possible? Why did you say that if Y is Poisson, then Y<∞? (same for exponential random variable, there is no upper bound.)
For a binomial random variable X, the possible values are 0,1,2,...,n, there is a upper bound, so Y<∞?
I am really confused. Can someone please explain more on this?
In many a situation, it is very useful to allow random variables to take the value $+\infty$. In those case, we have $Y\in\mathbb{N}\cup\{\infty\}$ for instance. This is a notational symbol (no limit involved). For instance, when one defines random times like $T=\inf\{n\geq 0|X_n<0\}$, it is common to set $T=\infty$ when the inf is not defined, i.e. when $X_n\geq 0$ for all $n$.

In integration/probability courses, it is common use to deal with positive functions that have values in $[0,+\infty]$, i.e. either positive real or $+\infty$.

We need a convention to integrate functions that may be equal to $\infty$ sometimes. The convention is the following: $E[\infty 1_A]=\infty$ if $P(A)>0$ and $E[\infty 1_A]=0$ else. In other words, as soon as we integrate $\infty$ on a positive probability event, the expectation is infinite. This implies the theorem you're quoting:

If a random variable Y (a priori taking values in $[0,+\infty]$) satisfies $E[Y]<\infty$, then in fact it takes finite values almost surely: $Y<\infty$ almost surely.

This is obvious because $E[Y]=E[Y 1_{(Y<\infty)}]+E[\infty 1_{(Y=\infty)}]\geq E[\infty 1_{(Y=\infty)}]$ $=\infty P(Y=\infty)$ and by the previous convention this is infinite iff $P(Y=\infty)>0$. Thus we must have $P(Y=\infty)=0$.

Application: Let $(A_k)_{k\geq 0}$ be a family of events. The random variable $S=\sum_{k=0}^\infty 1_{A_k}$ counts how many events happen. In some situation, it may take infinite values (for instance if all $A_k$ are the same non-negligible event). And we have, by Fubini (or monotone convergence), $E[S]=\sum_{k=0}^\infty P(A_k)$.
Then if $\sum_{k=0}^\infty P(A_k)<\infty$, you can conclude that $S<\infty$ almost surely, i.e. almost surely finitely many events happen. This is (one side of) Borel-Cantelli lemma.

4) So you mean the characterisitic function c(t) always exists for ALL real numbers t, is that right?
Also, for example, if we are asked to prove that the sum of 2 indepndent normal r.v.'s is again normal, then I think the proof using mgf is perfectly fine, but I see my textbook using characteristic function for this, is it absolutely necessary to use characteristic function in a proof like this?
Yes, the mgf would have done the work perfectly well.