Hi,

Suppose X, Y are independent discrete random variables with jont distribution P(X, Y). My regression model is of the form

$\displaystyle Z = E[Z | X, Y] + \epsilon = f(X, Y) + \epsilon,$

where the random variable $\displaystyle \epsilon$ is the noise-term. Next, assume that we ignore our knowledge about $\displaystyle Y$. The we have another regression model

$\displaystyle Z' = E[Z' | X] + \eta = g(X) + \eta,$

where $\displaystyle \eta$ is the noise term. We can express $\displaystyle g(X)$ as

$\displaystyle g(X) = \sum_y P(Y = y) f(X, y).$

My first question is: How can we interpret the right hand side of the last equation? Since X and Y are independent, we have for each x

$\displaystyle g(x) = \sum_y P(y) f(x, y) = \sum_y P_{Y | X}(y | x) f(x, y) = E[f(X, Y) | X = x].$

So the function g at x that has no knowledge about Y can be regarded as the conditional expectation of f given X = x. Is this argumentation correct? If independence does not hold, how do you call the expression

$\displaystyle \sum_y P(y) f(x, y),$

which looks a bit like an expectation?

The last question is concerned with the error term $\displaystyle \eta$. From the above, we have

$\displaystyle Z' = \sum_y P(y) Z - \sum_y P(Y)\epsilon - \eta.$

If I assume that $\displaystyle Z' = \sum_y P(y) Z$, then I may conclude

$\displaystyle \eta = \sum_y P(Y)\epsilon.$

But under which conditions is my assumption about Z' valid?

Thanks und best wishes,

samosa