# Expectation & Conditional Expectation

• Jan 23rd 2009, 12:47 PM
kingwinner
Expectation & Conditional Expectation
1) "Let g be any function of the random variable X.
Then the expected value E[g(X)|X]=g(X)"

I don't understand why. I tried to use the definition of conditional expectation to prove it, but it doesn't seem to work...

2) "If X is non-negative, then E(X) = Integral(0 to infinity) of (1-F(x))dx, where F(x) is the cumulative distribution function of X."

[Aside: the source that I quote this from says that the above is true no matter X is discrete or continuous. But if X is discrete, how can E(X) have an integral in it? It doesn't make much intuive sense to me...]

I tried integration by parts (letting u = 1 - F(x), dv=dx) and I think I am done if I can prove that
lim x(1-F(x)) = 0
x->inf
But this actually gives "infinity times 0" which is an indeterminate form and requires L'Hopital's Rule. I tried many different ways but was still unable to figure out what the limit is going to be...how can we prove that the limit is equal to 0?

Thanks for any help!
• Jan 25th 2009, 03:32 AM
Laurent
Quote:

Originally Posted by kingwinner
1) "Let g be any function of the random variable X.
Then the expected value E[g(X)|X]=g(X)"

I don't understand why. I tried to use the definition of conditional expectation to prove it, but it doesn't seem to work...

What is your definition of conditional expectation? Intuitively, the conditional expectation of $Y$ given $X$ is the function of $X$ which is "closest" to $Y$. In the case of $E[g(X)|X]$, $g(X)$ is itself a function of $X$, and it is of course the "closest" to $g(X)$, so that $E[g(X)|X]=g(X)$. The formal proof shouldn't be difficult, whatever definition you have; give us the definition you have if you want a more precise solution. (In fact, there's only one definition, but it may well be that you know a definition relative to a particular case, that's why I don't give the general answer yet)

Quote:

2) "If X is non-negative, then E(X) = Integral(0 to infinity) of (1-F(x))dx, where F(x) is the cumulative distribution function of X."

[Aside: the source that I quote this from says that the above is true no matter X is discrete or continuous. But if X is discrete, how can E(X) have an integral in it? It doesn't make much intuive sense to me...]

I tried integration by parts (letting u = 1 - F(x), dv=dx) and I think I am done if I can prove that
lim x(1-F(x)) = 0
x->inf
But this actually gives "infinity times 0" which is an indeterminate form and requires L'Hopital's Rule. I tried many different ways but was still unable to figure out what the limit is going to be...how can we prove that the limit is equal to 0?
The usual integration by part works if you can differentiate $F$, which is only possible if the distribution of $X$ is continuous.

In fact there's a more general integration by parts formula for functions with bounded variation, but you probably don't know it.

Then for the general case, you can procede as follows: write $1-F(x)=P(X>x)=E[{\bf 1}_{\{X>x\}}]$ (this is an indicator function in the expectation, it equals 1 on the event in the subscript, and 0 otherwise), and then by Fubini $\int_0^\infty (1-F(x))dx=\int_0^\infty E[{\bf 1}_{\{X>x\}}] dx= E\left[\int_0^\infty {\bf 1}_{\{X>x\}} dx\right] = E[X]$ (we integrate 1 when $0 and 0 otherwise).

Remark: if $$X$$ is discrete, then $$F$$ has "steps", so that the integral could be rewritten as a sum.

Other remark: you wondered if $x(1-F(x))\to 0$ when $$x\to\infty$$. This is true if $E[X]<\infty$. Indeed, we have $$x (1-F(x))=xP(X>x)\leq E[X\, {\bf 1}_{\{X>x\}}]$$ (this may be called Markov inequality, slightly refined), and by the bounded convergence theorem the right-hand side converges to 0.
• Jan 31st 2009, 12:22 PM
kingwinner
Quote:

Originally Posted by Laurent
What is your definition of conditional expectation? Intuitively, the conditional expectation of $Y$ given $X$ is the function of $X$ which is "closest" to $Y$. In the case of $E[g(X)|X]$, $g(X)$ is itself a function of $X$, and it is of course the "closest" to $g(X)$, so that $E[g(X)|X]=g(X)$. The formal proof shouldn't be difficult, whatever definition you have; give us the definition you have if you want a more precise solution. (In fact, there's only one definition, but it may well be that you know a definition relative to a particular case, that's why I don't give the general answer yet)

The usual integration by part works if you can differentiate $F$, which is only possible if the distribution of $X$ is continuous.

In fact there's a more general integration by parts formula for functions with bounded variation, but you probably don't know it.

Then for the general case, you can procede as follows: write $1-F(x)=P(X>x)=E[{\bf 1}_{\{X>x\}}]$ (this is an indicator function in the expectation, it equals 1 on the event in the subscript, and 0 otherwise), and then by Fubini $\int_0^\infty (1-F(x))dx=\int_0^\infty E[{\bf 1}_{\{X>x\}}] dx= E\left[\int_0^\infty {\bf 1}_{\{X>x\}} dx\right] = E[X]$ (we integrate 1 when $0 and 0 otherwise).

Remark: if $X$ is discrete, then $F$ has "steps", so that the integral could be rewritten as a sum.

Other remark: you wondered if $x(1-F(x))\to 0$ when $x\to\infty$. This is true if $E[X]<\infty$. Indeed, we have $x (1-F(x))=xP(X>x)\leq E[X\, {\bf 1}_{\{X>x\}}]$ (this may be called Markov inequality, slightly refined), and by the bounded convergence theorem the right-hand side converges to 0.

1) My definition of conditional expectation:
E(g(Y1)|Y2=y2)=

∫ g(y1) f(y1|y2) dy1
-∞
Now how can we prove rigorously? (aside: it isn't intuively clear to me either, I don't get the "...closest to Y" idea...do you mean that g(X)|X is a constant?)

2) So is the formula only valid when X is a continuous random variable and E(X) exists (i.e. finite)?

Thank you!
• Feb 1st 2009, 05:47 AM
Laurent
Quote:

Originally Posted by kingwinner
1) My definition of conditional expectation:
E(Y1|Y2=y2)=

∫ g(y1) f(y1|y2) dy1
-∞
Now how can we prove rigorously? (aside: it isn't intuively clear to me either, I don't get the "...closest to Y" idea...do you mean that g(X)|X is a constant?)

Alright. Except that this definition doesn't apply here.

What is $f(y_1|y_2)$?
Usually, this notation stands for $\frac{f_{(Y_1,Y_2)}(y_1,y_2)}{f_{Y_2}(y_2)}$ where $f_{(Y_1,Y_2)}$ is the density of $(Y_1,Y_2)$ and $f_{Y_2}$ is that of $Y_2$. This assumes that $(Y_1,Y_2)$ indeed has a density.

However, $(g(X),X)$ doesn't have a density (with respect to Lebesgue measure), so that this definition doesn't allow to give sense to $E[g(X)|X]$...

And what I was saying is that $E[g(X)|X]$ is the function of $X$ approximating $g(X)$ the best. Since $g(X)$ is already a function of $X$, it has to be $E[g(X)|X]$. (What could be a better approximate for a function than itself?)
Quote:

2) So is the formula only valid when X is a continuous random variable and E(X) exists (i.e. finite)?

Thank you!
The proof I gave of the formula holds for any non-negative r.v. $X$ (and $E[X]$ may in fact even be infinite).
• Feb 1st 2009, 10:07 AM
kingwinner
Quote:

Originally Posted by Laurent
Alright. Except that this definition doesn't apply here.

What is $f(y_1|y_2)$?
Usually, this notation stands for $\frac{f_{(Y_1,Y_2)}(y_1,y_2)}{f_{Y_2}(y_2)}$ where $f_{(Y_1,Y_2)}$ is the density of $(Y_1,Y_2)$ and $f_{Y_2}$ is that of $Y_2$. This assumes that $(Y_1,Y_2)$ indeed has a density.

However, $(g(X),X)$ doesn't have a density (with respect to Lebesgue measure), so that this definition doesn't allow to give sense to $E[g(X)|X]$...

And what I was saying is that $E[g(X)|X]$ is the function of $X$ approximating $g(X)$ the best. Since $g(X)$ is already a function of $X$, it has to be $E[g(X)|X]$. (What could be a better approximate for a function than itself?)

The proof I gave of the formula holds for any non-negative r.v. $X$ (and $E[X]$ may in fact even be infinite).

1) " $E[g(X)|X]$ is the function of $X$ approximating $g(X)$ the best"
My textbook never talked about this, so I am quite unfamiliar with this idea. Is there any other intuitive way to understand why E[g(X)|X]=g(X) ? (e.g. E(2X|X)=2X, or E(2X|X=x)=2x, why?)

2) Does it hold for discrete r.v., too? It has an integral in the formula, but for discrete r.v., shouldn't it be a sigma sum?

Thanks!
• Feb 1st 2009, 10:42 AM
Laurent
Quote:

Originally Posted by kingwinner
1) " $E[g(X)|X]$ is the function of $X$ approximating $g(X)$ the best"
My textbook never talked about this, so I am quite unfamiliar with this idea. Is there any other intuitive way to understand why E[g(X)|X]=g(X) ? (e.g. E(2X|X)=2X, or E(2X|X=x)=2x, why?)

You can also think of $E[g(X)|X=x]$ as the average value of $g(X)$ "when $X$ assumes the value $x$", that's what the notation says; and of course if $X=x$ then $g(X)=g(x)$, so that $E[g(X)|X=x]=g(x)$, no average needed.

Quote:

2) Does it hold for discrete r.v., too? It has an integral in the formula, but for discrete r.v., shouldn't it be a sigma sum?

Thanks!
As I wrote, it hold for any r.v.. And as I wrote again, for discrete r.v. the distribution function has steps so that you can rewrite the integral as a sum. Explicitely: $\int_0^\infty (1-F(x))dx=\sum_{n=0}^\infty \int_n^{n+1} P(X> x) dx = \sum_{n=0}^\infty P(X>n)$.