# Thread: Law of Total Expectation

1. ## Law of Total Expectation

The law of total expectation states that:
E(X) = E[E(X|Y)] and E[g(X)] = E[E(g(X)|Y)]

1) Now, is it correct to say that E(XY)=E[E(XY|Y)] ? I don't think the above law applies here, because in the law of total expectation the red part has to be a function of X alone (in particular, it cannot depend on Y), but here we have XY which is NOT a function of X alone. It is a function of both X and Y. Is that OK?

2) How about E[X h(Y)]=E[E(X h(Y)|Y)]? Is this a correct statement?

So I am really confused...and I would appreciate if anyone can help

[note: also under discussion in talk stats forum]

2. Hello,

Sorry if I explain it in my own words, I don't know if you've studied it the same way... And maybe there are small typos, but not very important.

1) We know that if Z is $\sigma(Y)$-measurable, then for any rv X (in L^2 I think), we have $E(XZ|Y)=ZE(X|Y)$

So since Y is obviously $\sigma(Y)$-measurable, E[E[XY|Y]]=E[YE[X|Y]] (*)
But there's something that says :
Let $\mathcal{B}$ be a $\sigma$-algebra. For any $\mathcal{B}$-measurable Z (positive), E[ZX]=E[Z E[X|Y]], and where X is positive. (this comes from the fact that E[X|Y] is the orthogonal projection of X over $L^2(\Omega,\sigma(Y),P)$, but you don't really need to know it if you haven't learnt this...)

So (*)=E[YX]

2) How about E[X h(Y)]=E[E(X h(Y)|Y)]? Is this a correct statement?
Exact same reasoning, under the condition that h is $\sigma(Y)$-measurable.

I hope this is clear enough

* $\sigma(B)$ is the smallest sigma-algebra that makes B measurable
** Note : a rv A is $\sigma(B)$-measurable iff there exists $\varphi$ which is $\sigma(B)$-measurable such that $B=\varphi\circ A$

3. Originally Posted by kingwinner
The law of total expectation states that:
E(X) = E[E(X|Y)] and E[g(X)] = E[E(g(X)|Y)]

1) Now, is it correct to say that E(XY)=E[E(XY|Y)] ? I don't think the above law applies here, because in the law of total expectation the red part has to be a function of X alone (in particular, it cannot depend on Y), but here we have XY which is NOT a function of X alone. It is a function of both X and Y. Is that OK?
I think what you're missing here is that you're dealing with dependent random variables, i.e. when we write E(X) = E[E(X|Y)], this holds for any (integrable) random variable X, even if X depends on Y in any way. For instance, X=Y, or X=YZ where Z is any other r.v. (provided X is integrable).

Therefore, E[XY]=E[E(XY|Y)] is a direct application where we used the random variable XY as X in the previous formula.

Now it should be obvious that E[X h(Y)]=E[E(X h(Y)|Y)] holds in the exact same way : this time, it is X h(Y) which plays the role of X. Any integrable variable does the trick, whether it depends on Y or not (and I would say, especially if it depends on Y, otherwise it is usual expectation).

4. Originally Posted by Moo
Hello,

Sorry if I explain it in my own words, I don't know if you've studied it the same way... And maybe there are small typos, but not very important.

1) We know that if Z is $\sigma(Y)$-measurable, then for any rv X (in L^2 I think), we have $E(XZ|Y)=ZE(X|Y)$

So since Y is obviously $\sigma(Y)$-measurable, E[E[XY|Y]]=E[YE[X|Y]] (*)
But there's something that says :
Let $\mathcal{B}$ be a $\sigma$-algebra. For any $\mathcal{B}$-measurable Z (positive), E[ZX]=E[Z E[X|Y]], and where X is positive. (this comes from the fact that E[X|Y] is the orthogonal projection of X over $L^2(\Omega,\sigma(Y),P)$, but you don't really need to know it if you haven't learnt this...)

So (*)=E[YX]

Exact same reasoning, under the condition that h is $\sigma(Y)$-measurable.

I hope this is clear enough

* $\sigma(B)$ is the smallest sigma-algebra that makes B measurable
** Note : a rv A is $\sigma(B)$-measurable iff there exists $\varphi$ which is $\sigma(B)$-measurable such that $B=\varphi\circ A$
Thanks for the response, but I am sorry to tell you that the level is too deep that right now I don't have enough background to understand this.

5. Originally Posted by Laurent
I think what you're missing here is that you're dealing with dependent random variables, i.e. when we write E(X) = E[E(X|Y)], this holds for any (integrable) random variable X, even if X depends on Y in any way. For instance, X=Y, or X=YZ where Z is any other r.v. (provided X is integrable).

Therefore, E[XY]=E[E(XY|Y)] is a direct application where we used the random variable XY as X in the previous formula.

Now it should be obvious that E[X h(Y)]=E[E(X h(Y)|Y)] holds in the exact same way : this time, it is X h(Y) which plays the role of X. Any integrable variable does the trick, whether it depends on Y or not (and I would say, especially if it depends on Y, otherwise it is usual expectation).
So I suppose E(X+Y)=E{E[(X+Y)|Y]} would also be correct? (simply by the law of total expectation given above and nothing more?)

For the law of total expectation: E(X) = E[E(X|Y)], I think your point is that the law is true in general for absolutely ANY random variables X and Y, right? And in particular, even if X is a function of Y, i.e. X=g(Y), or even if we replace X by h(X,Y), the law of total expectation still applies, right?

[When I first looked at the statement of the law of total expectation in the following form: E(X) = E[E(X|Y)] and E[g(X)] = E[E(g(X)|Y)], it really SEEMS to me that it requires X and g(X) to NOT depend on Y, i.e. must be a function of X ALONE, so for example we CANNOT replace X by g(X) or h(X,Y). But it looks like I may be wrong??]

Thanks for clarifying!

6. Originally Posted by kingwinner
So I suppose E(X+Y)=E{E[(X+Y)|Y]} would also be correct? (simply by the law of total expectation given above and nothing more?)

For the law of total expectation: E(X) = E[E(X|Y)], I think your point is that the law is true in general for absolutely ANY random variables X and Y, right? And in particular, even if X is a function of Y, i.e. X=g(Y), or even if we replace X by h(X,Y), the law of total expectation still applies, right?
All I can do is confirm: yes, ANY random variables X,Y (even X+Y and Y) can be used, provided X is integrable (this is a conditional "expectation"...).

If X did not depend on Y, i.e. if X was independent of Y, then it would be useless to deal with conditional expectation : E[X|Y]=X in this case. For a simple intuitive reason, namely that E[X|Y] can be understood as the average value of X when you "know" the value of Y. If the value of Y tells you nothing about X, then the average value is the usual one, it is not affected by the knowledge of Y. On the other hand, if X is entirely determined by Y, i.e. X=f(Y) for some function f, then if you know Y, you know X, hence averaging is trivial (only one value) : E[f(Y)|Y]=f(Y). The interest in conditional expectation comes from more complicated cases where variables are more subtly correlated. It gives you a kind of "approximation" of X in terms of Y. I'd say E[X|Y] is kind of the best bet you can say about the value of X when you know Y.

7. Originally Posted by Laurent
All I can do is confirm: yes, ANY random variables X,Y (even X+Y and Y) can be used, provided X is integrable (this is a conditional "expectation"...).
OK! But then looking at "properties 1 and 2" on page 2 of the following webpage,
http://www.stat.wisc.edu/courses/st312-rich/condexp.pdf
they stated properties 1 and 2 separately, and then provided a proof for each of them. But I think property 1 is completely general (becuase it applies for ANY random variables X and Y) and property 1 implies property 2, so we actually don't have to prove property 2 separately, right?

On the other hand, if X is entirely determined by Y, i.e. X=f(Y) for some function f, then if you know Y, you know X, hence averaging is trivial (only one value) : E[f(Y)|Y]=f(Y).
How can we prove E[f(Y)|Y]=f(Y) in the discrete or continuous case? I came across this property quite a few times, but I was never able to understand how to prove it.
Here is my attempt:
E[f(Y)|Y=y]
=E[f(y)|Y=y]
=E[f(y)] <---but how can we justify this step? f(Y) and Y are NOT independent random variables, then how can we drop the condition Y=y?
=f(y) [since f(y) is non-random (constant); E(c)=c]
Thus, E[f(Y)|Y]=f(Y)

Thank you for explaining!