ERM Induction consistency problem

Suppose that $\displaystyle X_1, ...,X_n $ are i.i.d. to F, an unknown distribution. We wish to estimate the true mean of F $\displaystyle = \mu $ with the loss function $\displaystyle L(X,f(X))=(X-f(X))^2 $

Prove that $\displaystyle \= {X} $ is consistent for ERM induction.

Proof so far.

Now, I know that the Bayes learner $\displaystyle f_{Bayes} = \mu $ and the ERM learner $\displaystyle f_{ERM} = \= {X} $

I need to show that both the risk function $\displaystyle R( \= {X} ) = E[(X- \= {X})^2 ] $ and the empirical risk function $\displaystyle \^ {R} ( \= {X} ) = \frac {1}{n} \sum ^n_{k=1} E [ (X_k- \= {X} )^2] $ converges to the Bayes Risk $\displaystyle R_{Bayes} = R( \mu ) = EL( X , \mu ) = E[ (X- \mu )^2 ] $

That is, I need to show that $\displaystyle \forall \epsilon > 0,Pr( | R ( \mu)-R( \= {X} )| < \epsilon ) = 0 $ and $\displaystyle Pr( | R ( \mu)-\^ {R} ( \= {X} )| < \epsilon ) = 0 $

For the first part, I have:

$\displaystyle Pr( | R ( \mu)-R( \= {X} )| < \epsilon ) $

$\displaystyle = Pr( | E[(X- \mu )^2-(X- \= {X} ) ^2| < \epsilon )$

$\displaystyle = Pr( | E[-2X \mu + \mu ^2 + 2 X \= {X} - \= {X}^2 ] |< \epsilon )$

$\displaystyle =Pr( | -2 \mu ^2 + \mu ^2 + 2 E X \= {X} - E \= {X}^2 |< \epsilon ) ...$ Looks like I'm lost...

I know by the the Law of Large Number, I have $\displaystyle Pr( | \mu - \= {X} | < \epsilon ) = 0 $, so I'm basically trying to mold it but stuck...

I'm guessing that one mistake I made is that the loss function should be $\displaystyle L( \mu , f ) = ( \mu - f )^2 $, but if I try that way, I can't get $\displaystyle \= {X} = f_{ERM} $ and I still can't get the ERM consistency going...

Any help, please? Thank you!!!

Re: ERM Induction consistency problem

Hey tttcomrader.

Since you have an expectation within the probability, you can use the identity E[X_bar] = mu which means you will get mu^2 and X_bar^2 left inside the expression.

From there you can either consider proving that X_bar^2 approaches mu^2 in convergence or you can use the equality mu^2 - X_bar^2 = (mu - X_bar)(mu + X_bar) and attempt to show that if z = mu + X_bar, then P(z*|mu-X_bar|) < epsilon. (You may have to show that the above holds for absolute values by considering the different branches).

Re: ERM Induction consistency problem

So is my loss function correct?

Re: ERM Induction consistency problem

Do you mean is the proof for the loss function correct?

The proof will be correct if both expressions |mu^2 - X_bar^2| <= |mu + X_bar|*|mu - X_bar| (You should look at Banach spaces for more information) and this is equivalent to saying that the norm is continuous (which it should be, but you may have to state a few things).

Re: ERM Induction consistency problem

Hey! Thank you for your post and I have to tell you I deeply appreciate you taking your time to help me here. I'm actually just trying to do some problems from previous courses that my professor taught and hope to learn something new lol.

What I meant before was that should the loss function be $\displaystyle ( \mu - f ) ^2 $ or $\displaystyle ( X - f) ^2 $

Since we are estimating for the true mean, I think the loss function should be the first one.

If it is the first one, then I can't prove that $\displaystyle \= {X} $ is the ERM learner since:

Attempted proof: I want to show that $\displaystyle \^ {R} (f) \geq \^ {R} ( \= {X} ) \ \ \ \ \ \forall f \in \mathcal{F} = \mathbb {R} $

Now, $\displaystyle \^ {R} (f) = \frac {1}{n} \sum _{k=1}^n ( \mu - f )^2 $

$\displaystyle = \frac {1}{n} \sum _{k=1}^n ( \mu - \= {X} + \= {X} - f ) $

$\displaystyle = \frac {1}{n} \sum _{k=1}^n [ ( \mu - \= {X} )^2+2( \mu - \= {X} )( \= {X} - f)+( \= {X} - f)^2 ] $

$\displaystyle = \frac {1}{n} \sum _{k=1}^n ( \mu - \= {X} ) ^2 + \frac {1}{n} \sum _{k=1}^n 2( \mu - \= {X} )( \= {X} - f)+ \frac {1}{n} \sum _{k=1}^n( \= {X} - f)^2 $

$\displaystyle = \^ {R} ( \= {X} ) + ... $ I'm stuck.

Basically I want to get those two terms on the right of $\displaystyle \^ {R} ( \= {X} )$ to be positive, well, of course, the third term is positive, problem is the second one...