Hello, I am reading a book on machine learning and struggling through - "Introduction to Machine Learning" by Ethem Alpaydin. I hit an early sub-chapter on PAC and I don't follow the author's math and reasoning.
The first math stumbling block I hit is when the author says "We have the inequality (1 - x) <= exp[-x]". He just drops that, doesn't explain where it comes from. Now, I can plug values into that statement and see that it's true. But where the heck did it come from? There's no context, it's just something he offers up, like saying "You need to nail this board to your house. Now, we know that pigeons eat seed ..." This is what always confuses me in math, when someone just throws out a statement that seemingly has no connection.
He then plugs the problem into that equation, so I need to give some context:
"In PAC learning given a class C and examples from unknown but fixed probability distribution p(x) we want to find the number of examples N such that with probability at least 1 - s the hypothesis h has error at most e for arbitrary s <= 1/2 and e > 0:"
(I have no way to draw the delta sign so I'll use ?)
"P{C?h <= e} >= 1 - s where C?h is the region of difference between C and h".
He then uses a graph to illustrate that the probability that N independent draws from the distribution will 'miss' the error region between C and h will be at most 4(1 - e/4)^N which we would like to be at most s.
I follow him to there, more or less. But then he introduces the inequality, and says:
"If we choose N and s such that we have 4exp[-eN/4] <= s"
Huh? Logically it seems to me that he has replaced 4(1 - e/4)^N with 4exp[-eN/4], since we stated that we want the former to be AT MOST (thus, <=) s. But I don't see how that substitution was made ... it doesn't seem that he used the formula, since he says we 'choose' this (another thing that boggles me).
He then goes on to state that, with the above 'choice':
"We can also write 4(1 - e/4)^N <= s".
Okay, sure, we can use the first thing you pulled out of the blue to replace the second thing you pulled out of the blue ... got it. But ... where did the two things come from? And isn't the last thing just the basic restriction he stated in the first place, making all that substitution stuff unnecessary?
Can anyone explain to me what those middle steps are about and why they are necessary?