# Probably Approximately Correct Learning

Printable View

• May 10th 2011, 01:06 AM
newagelightbulb
Probably Approximately Correct Learning
Hello, I am reading a book on machine learning and struggling through - "Introduction to Machine Learning" by Ethem Alpaydin. I hit an early sub-chapter on PAC and I don't follow the author's math and reasoning.

The first math stumbling block I hit is when the author says "We have the inequality (1 - x) <= exp[-x]". He just drops that, doesn't explain where it comes from. Now, I can plug values into that statement and see that it's true. But where the heck did it come from? There's no context, it's just something he offers up, like saying "You need to nail this board to your house. Now, we know that pigeons eat seed ..." This is what always confuses me in math, when someone just throws out a statement that seemingly has no connection.

He then plugs the problem into that equation, so I need to give some context:

"In PAC learning given a class C and examples from unknown but fixed probability distribution p(x) we want to find the number of examples N such that with probability at least 1 - s the hypothesis h has error at most e for arbitrary s <= 1/2 and e > 0:"

(I have no way to draw the delta sign so I'll use ?)

"P{C?h <= e} >= 1 - s where C?h is the region of difference between C and h".

He then uses a graph to illustrate that the probability that N independent draws from the distribution will 'miss' the error region between C and h will be at most 4(1 - e/4)^N which we would like to be at most s.

I follow him to there, more or less. But then he introduces the inequality, and says:

"If we choose N and s such that we have 4exp[-eN/4] <= s"

Huh? Logically it seems to me that he has replaced 4(1 - e/4)^N with 4exp[-eN/4], since we stated that we want the former to be AT MOST (thus, <=) s. But I don't see how that substitution was made ... it doesn't seem that he used the formula, since he says we 'choose' this (another thing that boggles me).

He then goes on to state that, with the above 'choice':

"We can also write 4(1 - e/4)^N <= s".

Okay, sure, we can use the first thing you pulled out of the blue to replace the second thing you pulled out of the blue ... got it. But ... where did the two things come from? And isn't the last thing just the basic restriction he stated in the first place, making all that substitution stuff unnecessary?

Can anyone explain to me what those middle steps are about and why they are necessary?
• May 10th 2011, 02:22 AM
pickslides
Quote:

Originally Posted by newagelightbulb
Hello, I am reading a book on machine learning and struggling through - "Introduction to Machine Learning" by Ethem Alpaydin. I hit an early sub-chapter on PAC and I don't follow the author's math and reasoning.

The first math stumbling block I hit is when the author says "We have the inequality (1 - x) <= exp[-x]". He just drops that, doesn't explain where it comes from. Now, I can plug values into that statement and see that it's true. But where the heck did it come from? There's no context,

It seems like you are frustrated, but you are also assuming everyone on this forum has the book in question, what if no one does? You would then be doing the same as the author in question.
• May 10th 2011, 03:03 AM
CaptainBlack
Quote:

Originally Posted by newagelightbulb
Hello, I am reading a book on machine learning and struggling through - "Introduction to Machine Learning" by Ethem Alpaydin. I hit an early sub-chapter on PAC and I don't follow the author's math and reasoning.

The first math stumbling block I hit is when the author says "We have the inequality (1 - x) <= exp[-x]". He just drops that, doesn't explain where it comes from. Now, I can plug values into that statement and see that it's true. But where the heck did it come from? There's no context, it's just something he offers up, like saying "You need to nail this board to your house. Now, we know that pigeons eat seed ..." This is what always confuses me in math, when someone just throws out a statement that seemingly has no connection.

$1-x$ is the tangent to $e^{-x}$ at the origin, but the gradient of $e^{-x}$ is a decreasing for positive $x$, so $e^{-x}$ is always above its gradient at the origin. (draw a picture and you will see what is going on).

CB