# Math Help - Introduction to Calculus Tutorial

1. ## Introduction to Calculus Tutorial

This tutorial will be limited to polynomial functions in single and multi-variables. But note these theorems work for the standrard functions in general too. Thus, this will help you familiarize thyself with the most important rules in Calculus.

Definition: A polynomial function (real) can be expressed in the form:
$$f(x)=a_nx^n+a_{n-1}x^{n-1}+...+a_1x+a_0$$.
Here are some explames,
$$f(x)=a_0$$ is a constant function (it looks like a horizontal line).
$$f(x)=a_1x+a_0$$ is a linear function (it looks like a slanted line when $$a_1\not = 0$$).
$$f(x)=a_2x^2+a_1x+a_0$$ is a quadradic (it looks like a parabola when $$a_2\not =0$$).
Thus, the largest non-vanishing exponent (when it is not zero) is the degree of the polynomial.

I will not introduce multi-varaible functions just yet. I will do that when I reach partial differenciation.

I am sure you understand the basic notion, that is "What does the function approach as the value gets closer to some specific number".

I am also sure you are familar with the geometric meaning of the derivative. That is, we draw a secant line and make it closer and closer to a point. (Since I do not have aninamation you should find one somewhere on the internet und see their animation. Klicken heir.).

The last statement tells us (about the derivative) that say we have a point on a polynomial $$f(x)$$ which is $$(c,f(c))$$.
How do we find the derivative at the point? We choose a point nearby, let us use $$\Delta x$$ to represent the small increase in the domain (input value). Then the nearby point is $$(c+\Delta x, f(c+\Delta x))$$. Now we find the slope of these two points,
$$\frac{f(c+\Delta x)-f(c)}{c+\Delta x-c}=\frac{f(c+\Delta x)-f(c)}{\Delta x}$$. And we take the limit as $$\Delta x\to 0$$.
Thus, the derivative at the point is,
$$\boxed{ \lim_{\Delta x \to 0}\frac{f(c+\Delta x)-f(c)}{\Delta x}}$$.
Note, we could have also done this "backward". Meaning instead of $$\Delta x$$ being an increase it could have been a decrease, but that is of no importance because it leads to the same result.
There is another way how to get the derivative at a point. Again let the point on the polynomial curve be $$(c,f(c))$$ then let a nearby point be $$(x,f(x))$$ thus the slope (together with the limit):
$$\boxed{\lim_{ x\to c}\frac{f(x)-f(c)}{x-c}}$$.

That is the derivative of a polynomial at a point. The derivative of a polynomial is a function whose input values (domain) are the points on the curve and whose output values (the function value) are the derivatives (the slope of tangent line) at the point.

Here is a classic example.

Example 1: Consider the curve $$y=x^2$$ what is the derivative at $$(1,1)$$?
We need to find,
$$\lim_{\Delta x\to 0}\frac{(1+\Delta x)^2-1^2}{\Delta x}=\lim_{\Delta x \to 0}\frac{1+2\Delta x+(\Delta x)^2-1}{\Delta x}$$
Combine,
$$\lim_{\Delta x\to 0}\frac{(2\Delta x)+(\Delta x)^2}{\Delta x}=\lim_{\Delta x\to 0}2+\Delta x=2$$
The final limit is true because when you add the very small number to 2 you get almost the same result back. Thus, when you approach it to zero the final result is 2+0=2.

Example 2: But what is the derivative of $$y=x^2$$? It means a formula that enables us to calculate the derivative at a point (a function that produces the derivative at a point). If the point is $$x$$ then the derivative at the point is,
$$\lim_{\Delta x\to 0}\frac{(x+\Delta x)^2-\Delta x}{\Delta x}$$
Skipping some steps (similar as before) we arrive at,
$$\lim_{\Delta x\to 0}2x+\Delta x=2x$$.
Thus, the derivative of $$y=x^2$$ is a new curve $$y=2x$$.
Now if we go back to the problem just before it asks to find the derivative (or slope of tangent line) at $$x=1$$.
Just substitute that for derivative function $$y=2(1)=2$$. Thus the derivative is 2 at that point.

We need to know one important thing about derivatives: "The derivative of a function is itself a function".

Another thing about derivatives that we will pay no attention to is the concept of differenciability. It means that the derivative exists (that means the limit that we take exists, because as you know not everything does exists as a limit). The reason why we will ignore that because we will use polynomials for which we can always take derivatives of. The reason why I am mentioning this is because this is like division by zero it leads to faulty reasoning. It is also another feature that divides immortals (the mathematicians) from the mortals (scientists) who do not pay attention to differenciability. Thus, if you ever play around with derivatives and get some strange results remember, it is probably what I said.

Time to introduce some notation. If $$y=f(x)$$ is our function. Then the derivative can be expressed as:
$$\frac{dy}{dx}$$, $$y'$$, $$f'(x)$$.
In the first notation we cannot cancel the d's. They are not numbers rather represent some operation, that is, derivative.
I myself am not in favor of this notation, because it has a purpose to it. We can "magically" think of this as a fraction and split the $$dy$$ and $$dx$$ which is a favorite in physics. And sometimes it leads to faulty conclusions because it is non-mathematical (but it looks cool). To add some history it is called "Leibniz" notation.

Example 3: If $$y=x^2$$ then $$y'=2x$$.

Example 4: If $$y=2x^3$$ then to find the derivative we need to find,
$$f(x+\Delta x)=2(x+\Delta x)^3$$
Substract $$f(x)=2x^3$$
Then divide by $$\Delta x$$
And then take the limit.
I will write the limit in the end because it takes less space.
$$2(x+\Delta x)^3=2x^3+6x^2\Delta x+6x(\Delta x)^2+2(\Delta x)^3$$.
Next from this we subtract,
$$2x^3$$
Thus we have,
$$6x^2\Delta x+6x(\Delta x)^2+2(\Delta x)^3$$
Divide through by $$\Delta x$$
Thus,
$$6x^2+6x\Delta x+2(\Delta x)^2$$
Take the limit,
$$6x^2$$.
Thus,
$$y'=6x^2$$.

There got to be a better way! Does it mean we need to do always this long mess? No! There are rules. In fact I will make you develope them.
~~~
Excerises

1)Find $$y'$$ for $$y=x^2+x$$.

2)Find equation of tangent line at $$(1,2)$$ in problem above.

3)Use the limit definition of derivative and find,
$$y'$$ for $$y=\frac{1}{x+1}$$.

4)If $$y=f(x)$$ and $$y'=f'(x)$$.
What does you think happens with,
$$y=k\cdot f(x)$$ then $$y'=?$$

5)Find derivative for.....
$$y=1=x^0$$
$$y=x=x^1$$
$$y=x^2$$
$$y=x^3$$
$$y=x^4$$
And guess what the pattern is.

*6)Prove the pattern always holds.
(Hint use the binomial theorem:
$$(x+y)^n=\sum_{k=0}^n {n\choose k}x^{n-k}y^k, n\geq 0$$
Where,
$${n\choose k}=\frac{n!}{k!(n-k)!}$$ are called "binomial coefficients").

2. Okay, since we know what derivatives mean.
We can develope a few rules that will help us.

1)Derivative of Constant Function: The derivative of a function $f(x)=k$ (some number) is zero, that is, $f'(x)=0$. One way is is to think of this as the slope of a vertical line (which is zero). Another way is through the limit:
$\lim_{\Delta x\to 0}\frac{f(x+\Delta x)-f(x)}{\Delta x}=\lim_{\Delta x\to 0} \frac{k-k}{\Delta x}=\lim_{\Delta x\to 0}0=0$

2)Derivative of a Sum: You might have expected
$(y_1+y_2)'=y_1'+y_2'$ where $y_1,y_2$ are some functions. Again, this is easy to show through the limit.
Let $y_1=f(x)$ and $y_2=g(x)$.
That means,
$y_1'=\lim_{\Delta x\to 0}\frac{f(x+\Delta x)-f(x)}{\Delta x}$
$y_2'=\lim_{\Delta x\to 0}\frac{g(x+\Delta x)-g(x)}{\Delta x}$
And,
$(y_1+y_2)'=\lim_{\Delta x\to 0}\frac{f(x+\Delta x)+g(x+\Delta x)-f(x)-g(x)}{\Delta x}=$ $\lim_{\Delta x\to 0}\underbrace{\frac{f(x+\Delta x)-f(x)}{\Delta x}}_{y_1'}+ \underbrace{\frac{g(x+\Delta x)-g(x)}{\Delta x}}_{y_2'}=y_1'+y_2'$

3)Derivative of a Difference: The same thing as addition. That is,
$(y_1-y_2)'=y_1'-y_2'$

Note: If you want to sound cool and impress your teachers you can say rules #1,2,3 are true because "Differenciation is a linear transformation for the vector space of differenciable functions over the field of reals". Or can you can "Differenciation is a homomorphism". Basically "derivative" has the property that the derivative of a sum is the sum of the derivatives. You will found many operators during you math learning that have this propert, pay attention to them.

4)Derivative of a Product: The rule says,
$(y_1y_2)'=y_1'y_2+y_1y_2'$.
The derivation is a bit strange, but it relys on a trick mathematicians love to use.
Again using the same convention for $y_1,y_2,y_1',y_2'$ as above we can write,
$(y_1y_2)'=\lim_{\Delta x\to 0}\frac{f(x+\Delta x)g(x+\Delta x)-f(x)g(x)}{\Delta x}$
Add and subtract $f(x+\Delta x)g(x)$ (thus no change in the expression).
Thus,
$\lim_{\Delta x\to 0}\frac{f(x+\Delta x)g(x+\Delta x)\overbrace{-f(x+\Delta x)g(x)+f(x+\Delta x)g(x)}^{\mbox{ This is zero }}-f(x)g(x)}{\Delta x}$
Thus, some factoring,
$\lim_{\Delta x\to 0}f(x+\Delta x)\cdot \frac{g(x+\Delta x)-g(x)}{\Delta x}+g(x)\cdot \frac{f(x+\Delta x)-f(x)}{\Delta x}$
Note,
$\lim_{\Delta x\to 0}f(x+\Delta x)=f(x+0)=f(x)$
Thus,
$y_1y_2'+y_2y_1'$

Note the only questionable step was,
$\lim_{\Delta x\to 0}f(x+\Delta x)=f(x)$.
Because, in the other post I spoke about limits and I mentioned you can only substitute the value when the function is continous at the point. It happens to be true. Because a differenciable function at a point is continous at a point, this is extremely useful in theoretical demonstrations.

5)Derivative of a Quoteint: If $y_2\not = 0$ then,
$(y_1/y_2)'=\frac{y_1'y_2-y_1y_2'}{y_2^2}$
This time we add and subtract $f(x)g(x)$
$\lim_{\Delta x\to 0}\frac{\frac{f(x+\Delta x)}{g(x+\Delta x)}-\frac{f(x)}{g(x)} }{\Delta x}$
Thus,
$\lim_{\Delta x\to 0}\frac{f(x+\Delta x)g(x)-f(x)g(x+\Delta x)}{\Delta xg(x)g(x+\Delta x)}$
Thus,
$\lim_{\Delta x\to 0}\frac{f(x+\Delta x)g(x)-f(x)g(x)+f(x)g(x)-f(x)g(x+\Delta x)}{\Delta xg(x)g(x+\Delta x)}$
Thus,
$\lim_{\Delta x\to 0}\frac{g(x)\cdot \frac{f(x+\Delta x)-f(x)}{\Delta x } - f(x)\cdot \frac{g(x+\Delta x)-g(x)}{\Delta x}}{g(x)g(x+\Delta x)}$
Thus,
$\frac{y_1'y_2-y_1y_2'}{y_2^2}$

6)The Power Rule: The power rule says that for any in positive integer $n$ we have,
$y=x^n$
$y'=nx^{n-1}$.
I am going to present two proofs.
One the standard way.
The second which is my way.
The standard way is by the binomial expansion. The only important thing to know is that,
$(x+y)^n=x^n+nx^{n-1}y+A_{n-2}x^{n-2}y^2+A_{n-3}x^{n-3}y^3+...+A_1xy^{n-1}+A_0y^n$
Where, $A_{n-2},....,A_0$ are some numbers which we really do not care about.
That means,
$(x+\Delta x)^n=x^n+nx^{n-1}\Delta x+....$
From this we subtract the orginal function $x^n$.
Thus,
$nx^{n-1}\Delta x+A_{n-2}x^{n-2}(\Delta x)^2+...$
Then we divide through by $\Delta x$,
$nx^{n-1}+A_{n-2}x^{n-2}\Delta x+...$
But everything to the right of $nx^{n-1}$ is zero because a $\Delta x$ exists.
Thus,
$nx^{n-1}$.

Here is my way....
Note, by product rule,
$(y_1y_2y_3)'=((y_1y_2)y_3)'=(y_1y_2)'y_3+(y_1y_2)y _3'$
Thus, by product rule again,
$(y_1'y_2+y_1y_2')y_3+y_1y_2y_3'$
Thus,
$y_1'y_2y_3+y_1y_2'y_3+y_1y_2y_3'$.
In fact this pattern hold for some positive integer of functions,
$(y_1y_2...y_n)'=y_1'y_2...y_n+y_1y_2'...y_n+...+y_ 1y_2...y_n'$
Thus,
$x^n=\underbrace{x\cdot x\cdot ... \cdot x}_n$
Thus, the derivative by Hacker's product rule above is,
$\underbrace{(x)'x\cdot ...\cdot x+x\cdot (x)'\cdot ... \cdot x+...+x\cdot x\cdot ... \cdot (x)'}_n$
But, $x'=1$.
Thus,
$\underbrace{\overbrace{x\cdot ... \cdot x}^{n-1}+...+\overbrace{x\cdot ... \cdot x}^{n-1}}_n$
Thus,
$nx^{n-1}$

The easy way to remember this rule is to bring down the exponent in front of the varaible and reduce exponent by 1.
The nice thing is that though we proved it only for positive integers it hold for any power.
Thus, we can find,
$y'$ for $y=\sqrt{x}=x^{1/2}$
Time for examples.
But I am sure you do not need them, you are good with algebra.

Example 5: $y=x^3+x^2+x$.
By, the sum and power rule we have,
$y'=3x^2+2x+1$

Example 6: $y=(x+1)(x+2)$
By the product rule,
$y=(x+1)'(x+2)+(x+1)(x+2)'$
But the power and sum rules,
$y=1(x+2)+(x+1)1=2x+3$
We could have also multiplied out,
$y=(x+1)(x+2)=x^2+3x+2$
By, power and sum rules,
$y'=2x+3$.
Same result.

Example 7: $y=\frac{1}{x}$.
We can write,
$y=x^{-1}$.
Though it is not a positive integer the rule still holds,
$y'=(-1)x^{-2}=-\frac{1}{x^2}$.

Now we reach the most important rule. The "chain rule".
I will state it, and explain why it is called chain rule.

First we need to know what a composite of functions is. If we have a functions $f(x)$ and $g(x)$. The composition of these functions is a new function,
$f(g(x))$ where $g(x)$ is the "inner" and $f(x)$ is the "outer".
Note, $f(g(x))\not = g(f(x))$, (always)meaning it is not commutative.

Example 8: If $f(x)=2x$ and $g(x)=x+1$.
Then,
$f(g(x))=f(x+1)=2(x+1)=2x+2$.

7)Limit of a Composition: The rule says,
$[f(g(x))]'=g'(x)f'(g(x))$.
Take the derivative of the inside, multiply it by the composition of the derivative of outer function.
There is an easy to do it.
$y=f(g(x))=f(u)$
Where $u=g(x)$.
Then, the composition,
$\frac{dy}{dx}=\frac{dy}{du}\cdot \frac{du}{dx}$.
As if we can cancel the $du$'s.

Example 9: $y=(x+1)^{10}$
We can write,
$y=u^{10}$ and $u=x+1$
Thus,
$\frac{dy}{du}=10u^9$
$\frac{du}{dx}=1$
Thus,
$\frac{dy}{du}\cdot \frac{du}{dx}=\frac{dy}{dx}=10u^9=10(x+1)^9$

Example 10: $y=((2x+1)^9+1)^8$
We can write,
$y=(u^9+1)^8$ and $u=2x+1$
But we can write more,
$y=v^8$ , $v=u^9+1$, $u=2x+1$.
Thus,
$\frac{dy}{dv}=8v^7=8(u^9+1)^7=8((2x+1)^9+1)^7$
$\frac{dv}{du}=9u^8=9(2x+1)^8$
$\frac{du}{dx}=2$
Thus,
$\frac{dy}{dx}=\frac{dy}{dv}\cdot \frac{dv}{du}\cdot \frac{du}{dx} = 8((2x+1)^9+1)^7(9(2x+1)^8)(2)$
Hence the name "chain rule".

I will not prove the chain rule. The proof is difficult. But there is a nice trick, a weaker result, that makes it an easy prove. I do not want to post it because that will be the next problem of the week.

Thus, now you know the most important rule about derivatives!
---
Excersices.
Find $y'$ for the following functions:

1) $y=x^9+x^2+1$

2) $y=x^6-x^2-\sqrt[3]{x}$

3) $y=\frac{1}{1+x^2}$

4) $y=(x+1)(3x-1)^2$

5) $y=\sqrt{0}+1^2-6$

*6)Prove the chain rule for polynomial functions.
That is given,
$f(x)=a_nx^n+...+a_1x+a_0$
$g(x)=b_mx^m+...+b_1x+b_0$
Find their composition and then compute the derivative.
And compute the derivative via chain rule.
And then show the results match.

3. Originally Posted by ThePerfectHacker
Note: If you want to sound cool and impress your teachers you can say rules #1,2,3 are true because "Differenciation is a linear transformation for the vector space of differenciable functions over the field of reals".
If you want to sound really cool, I'd say that the fact that the derivative is a linear operator means that it converts a linear combination as argument, to a linear combination of its image. The fact that a constant is mapped to zero, can be seen as a result of a more general requirement. Being linear means that D(f(x)+g(x)) = D(f(x))+D(g(x)), but also that D(c.f(x) = c.D(f(x)), with c a scalar (an element of the field, in your case the reals). Combined: linearity holds iff D(a.f(x)+b.g(x)) = a.D(f(x)) + b.D(g(x)).

4. I was debating what I should lecture on. Finally after many hours of thought I decided to show some applications that the derivative is used for. The first important thing to know about the derivative is that it represents the instantenous rate of change. If $f(t)$ is some function based on time, which can represent: the distance, the amout, the population,.... Then $f(t+\Delta t)-f(t)$ is the change small change in the function. To find the average rate of change we divide through by the time passed $(f(t+\Delta t)-f(t))/ \Delta t$. Note for a small increase $\Delta t$ the average rate of change of the function is almost its instantenous rate of change (the rate of change at that point). The smaller $\Delta t$ is the more accurate this expression. Thus we need to consider the limit $\Delta t\to 0$. In that case we have the derivative. Thus, the rate of change at some moment in time is the derivative at that point.

Origin of Differencial Equations:
I think it is a good time to mention what a differencial equation is. To show that we will consider the following problem: "A tank is filled with 10 gallons of pure water. There are two pipes. One taking the water in and one taking the water out. The flow rate is the same for both at 3 gallons/min. The tube that takes the water in contains 1 gallon of salt. Find a function that represents the amount of salt at any given time"
We need to understand the difficultly in this problem. The difficultly is that the flow out tube also takes out salt with it also of the mixture of salt and water. Thus, this is really not an easy question to answer. We will not answer is question but rather set up a differencial equation. Let $Q(t)$ be the amount of salt at any given time passed $t\geq 0$. Then, as mentioned before, the derivative, $dQ/dt$ is the rate of change of this amount. One way we can find the rate of change is to note that:
$\frac{dQ}{dt}=\mbox{ rate in }-\mbox{ rate out }$.
Where "rate in, rate out" represent the rate at which salt is entering and leaving the tank.
Every minute 3 gallons of mixure enter the tank with 1 gallon of salt. Thus the rate in is constant at 1 gallon/min.
The rate out is a bit tricker, but not so bad. The rate out is the concetration of salt in the mixture multiplied by the amout leaving (that is 3 gallons). The concetration is the amount of salt at that time which is $Q$ divided by the total volume which is fixed at 10 gallons.
Then we need to multily this result by 3 because 3 gallons are leaving. Thus,
$\frac{dQ}{dt}=1-\frac{3Q}{10}$
$\frac{dQ}{dt}+\frac{3Q}{10}=1$
This is a differencial equation.
We need to find a function that makes this statement true.
Unlike an algebraic equation where we need to find a number that makes a statement true, here we need to find a function. And there is a way to solve for that, I am just not going to do that. The interesting thing is that there are infinitely many solutions to that equation! How do we know which one is it? We use the important fact that at $t=0$ we have pure water, thus, $Q(0)=0$. And with this condition (called intial condition) the differencial equation will have a unique solution. And that solution will describe the amount of salt in the tank.

The Motion Problem
The motion problem deals with the motion of some object. To understand the technique used we need to be familar with the meanings of: distance, speed and acceleration.
Distance we already know that is means, the function $s(t)$ will represent the total distance traveled for some time $t$. Speed is distance traveled per time. Thus, $(s(t+\Delta t)-s(t))/\Delta t$ is the average speed per some small amount of time. Again taking the limit we find that $s'(t)=v(t)$ meaning, the derivative of distance is speed. Accelleration is the speed changed by a certain time. Meaning, an accelleration 5 miles per hour per second means each second the speed (5 miles per hour) increases by 5 miles per hour. Again by similar reasoning $a'(t)=v(t)$. The derivative of acceleration is speed.

Example 11: The distance a particle travels can be expressed as a function: $s(t)=12t^3$. Find the accelleration of the particle after 1 second. We will actually need the second derivative (meaning just take the derivative twice). Because $a(t)=v'(t)=(s'(t))'=s''(t)$. Taking 2 derivative of the distance function we find that:
$s'(t)=36t^2$ and $s''(t)=72t$
At $t=1$ we have $s''(1)=72$ units per some time per second.

We can use the motion problem to solve the free falling problem. The free falling problem concerns an object falling under the effect of gravity. Let us assume that $v_0$ is the initial speed which an object is thrown at (positive for up and negative for down). And $s_0$ is the initial height at which we are standing. What function represents the height of the object as a function based on time? To solve this famous problem (historically I think Galileo solved it first, then Newton explained why it works). We need to be familar with an important property that all falling object posses. They accellerate downwards with a constant rate (why that is, we do not know).
The downwards accelleration is based on the gravitational force acting on the object. On Earth the acceleration is 32 feet per second per second.
Thus, we are looking for a function $s(t)$ that represents the height of the object from the ground.
What we do know is that the second derivative is the acceleration,
$s''(t)=-g$
Where $g$ is the acceleration from the force of gravity. It is negative because by our sign convention, down is negative (fallings objects go down) and upwards is positive.
This is actually a differencial equation (the most basic type).
To make it easier to follow we can think of the derivative as,
$(s'(t))'=-g$
What function $s'(t)$ has its derivative equal to $-g$. Think about this.... You should come up with that $s'(t)=-gt$. But wait, any constant that we attach in the end will dissappear thus,
$s'(t)=-gt+C$
(What we just did is called taking the integral, or antiderivative. It turns out that all antiderivatives that satisfy this throughout the interval must differ by a constant).
If we subsitute $t=0$ we have,
$s'(0)=v(0)=v_0=-g(0)+C$
Thus, $C=v_0$ (initial velocity).
Thus, $s'(t)=-gt+v_0$.
What function has its derivative equal to $-gt+v_0$. Some thought should produce,
$s(t)=-(1/2)gt^2+v_0t+C$
Substitute $t=0$.
$s(0)=s_0=-(1/2)g(0)^2+v_0(0)+C$.
Thus, $C=s_0$.
Thus,
$\boxed{ s(t)=-\frac{1}{2}gt^2+v_0t+s_0}$.

Example 12: An angry husband throws up his wife with the initial velocity at 96 feet per second. They live on a cliff with the altitude of 960 feet. Find how much time will pass until his wife reach maximum hieght. And find the amount of time until he hits the ground and dies.
The function is,
$s(t)=-16t^2+96t+960$.
Maximum height is when velocity is zero.
Thus,
$s'(t)=v(t)=-32t+96=0$
The amount of time that passes is when the height is zero when she comes crashing down.
$s(t)=-16t^2+96t+960=0$.

The Approximation Problem
The type of approximation that we will use is a linear-approximation. If $f(x)$ is some curve then draw a tangent line at some point. The tangent line is almost like the curve (for points close to it). That we can approximate the value in the function with the aid of the tangent line.
The important formula we need to know is that if a line has slope (non-vertical) $m$ and contains a point $(x_0,y_0)$ its equation is,
$\boxed{y-y_0=m(x-x_0)}$.

Rather than going through a general approach to this problem let us just do an example.

Example 13: Approximate the value of $\sqrt{63.8}$. Note that it is almost $\sqrt{64}=8$. Thus we will draw the tangent line at $x=8$. But that what curve? What are we trying to approximate? We are trying to approximate the square root function $y=\sqrt{x}=x^{1/2}$. Its derivative at the point is (by the power rule),
$y'=\frac{1}{2\sqrt{x}}$.
The slope of the tangent line at $(64,8)$.
Is, $m=\frac{1}{2\sqrt{64}}=\frac{1}{16}$
Thus, the equation of the tangent line is,
$y-8=\frac{1}{16}(x-64)$
$y=\frac{1}{16}x+4$.
Thus,
$\sqrt{63.8}\approx \frac{63.8}{16}+4=7.95$
While the actual value is,
$\sqrt{63.8} = 7.98....$
A close value.
The smaller the change the closer the approximation.

Newton's Method
This is a method for getting zero's for a function. That means the solution to the equation $f(x)=0$.
The general approach is to consider some curve $f(x)$ and find the point(s) where it intersects the x-axis when we graph it. The idea is to find a point close to the intersection and draw a tangent line. The point where the tangent line crosses the x-axis is even closer to our guess. Then using that point we draw another tangent line being even closer now we use that point, again and again. Each time approximating the solution to the equation closer.

Here is the general procedure.
Say $f(x)$ is some curve which crosses the x-axis. Let $x_0$ be our initial guess. Meaning a reasonable number that we think the solution is near. Then we draw a tangent line at that point. We will use the formula I mentioned above already,
$y-y_0=m(x-x_0)$
$(x_0, f(x_0))$ is the point on the tangent line. And $f'(x_0)$ is our slope of tangent line.
Thus,
$y-f(x_0)=f'(x_0)(x-x_0)$
Where does this intersect the x-axis?
When $y=0$.
Thus,
$-f(x_0)=f'(x_0)(x-x_0)$
Solve for $x$,
$x=x_0-\frac{f(x_0)}{f'(x_0)}$
We call this result $x_1$ our next number.
Using this number we draw another tangent line,
$y-f(x_1)=f'(x_1)(x-x_1)$
Solve, for $y=0$
$x=x_1-\frac{f(x_1)}{f'(x_1)}$
We call this $x_2$.
The pattern is clear,
$x_{n+1}=x_n-\frac{f(x_n)}{f'(x_n)}, n\geq 0$.
The sequnce that we will get,
$x_0,x_1,x_2,...$
Will usually approach the zero at a kollasal rate.
After a few attempts we have a lot of digits.

This method is not foolproof. There are times when it fails. For example if the derivative is somewhere zero then we cannot divide by it. And stuff like that. For most of the time it will work. Another thing that might happen it will converge to a different zero that you do not want. It is recommend to use software to graph a curve for you thus you can use thy eyes to make a reasonable guess.

Example 14: I graphed the curve $y=x^3+x+1$ this is a cubic equation.
Look at the graph below.
The curve is in red.
The dotted blue lines are the tangents.
The black lines show where we draw the tangent on the curve at the point it crosses the x-axis.
I started my guess at $x_0=1$.
Then I drew a tangent line.
It intersected at $x=.25$.
Thus, I drew a tangent at,
$(.25,f(.25))=(.25,1.26)$
It intersected at,
$x=-.815$.
Then drew a tangent line at,
$(-.815,f(-.815))=(-.815,-.35)$
Look how close it intersects the x-axis.
The that point of intersection is, (not shown)
That is our approximate zero.

With enough talk you should now get the idea. Let us do this problem.
$f(x)=x^3+x+1$
$f'(x)=3x^2+1$
Thus,
$x_{n+1}=x_n-\frac{f(x_n)}{f'(x_n)}$
Thus,
$x_{n+1}=x_n-\frac{x_n^3+x_n+1}{3x_n^2+1}$
Thus,
$x_{n+1}=\frac{3x_n^3+x_n-x_n^3-x_n-1}{3x_n^2+1}=\frac{2x_n^3-1}{3x_n^2+1}$

What I like to do now, is to program this function into graphing software. And tell it to evaluate it.
Thus, we are looking for,
$x_0,f(x_0),f(f(x_0)),f(f(f(x_0))),...$
Thus we get,
$1,.25,-.8158,-.6961,-.6825,-.6823,-.6823,...$
A few iterations and we got that,
$x\approx -.6823$
To 4 decimal points!
People complain above the evaluation of the iterations that it takes a long time. Not true! I got the decimal sequence above in less then 30 seconds after I programed the software to do those evaluation. If you do it by calculator or by hand it takes much longer then given the iterative sequence to the computer.

Application of Newton's Method for Evaluating Square Roots.
We can now develope a techinque for evaluation of square roots by hand. Let us assume we want to find $\sqrt{n}, n>0$.
We can see that the solution to,
$f(x)=x^2-n$
Is $\sqrt{n}$. (It also has $-\sqrt{n}$ for $n>0$).
We will use Newton's method to approximate a zero and hence find the square root.
$f(x)=x^2-n$
$f'(x)=2x$
Thus,
$x_{n+1}=x_n-\frac{x_n^2-n}{2x_n}=\frac{x_n^2+n}{2x_n}$
The nice thing is you can also start at $x_0=1$ (you cannot use $x_0=0$ the derivative is zero and the method does not work).

Example 15: Let us find $\sqrt{2}$.
$x_0=1$
$x_1=\frac{1^2+2}{2}=1.5$
$x_2=\frac{1.5^2+2}{3}=1.41$
~~~
Excerises

1)A tank is filled with 20 gallons of pure urine. A intake tube makes 10 gallons of pure water come in every minute. The outake tube lets the mixture pour out at the same rate.
Set up a differencial equation that shows the amount of urine at any time $t\geq 0$. (With initial conditions).
And,
A differencial equation that shows the amount of water at any time $t\geq 0$. (With initial conditions).
Do not solve.

2)A the distance an electron travels can be represented as the equation,
$s(t)=t^4+t^2+1$ for $t\geq 0$.
For microseconds in Angstroms.
Find the acceleration and speed of the electron in Angstroms after 1 millisecond.

3)A bullet from a Magnum .357 is fired by a 6 foot man with his arms extended 1 foot above his head directly up. The muzzle velocity is $738 ft/sec.$. What is the speed of the bullet when it comes down and kills the man (although Mythbusters shown it is impossible to die like that).

4)Approximate $\sqrt{16.1}$ using a linear approximation.

5)Use Newtons method and solve $x^4+x^2-2$ for the positive root. Find the actual value in terms of radicals and find the approximated value with Newtons method and compare. (A graph will help).

6)Find $\sqrt{17}$ by hand.

*7)Modify the techique to find a method for finding $\sqrt[n]{x}$. The n-th root ( $n,x>0$).

5. Now we get to the fun stuff. This is one of the features that makes Calculus so powerful. Optimization problems.
These are problems that ask for the minimum or maximum. Things in nature usually come in maximum or in minimum.

I will first show how to maximize/minimize single variable function that is $y=f(x)$ and then multivariable functions. Though multivariable function is a Calculus III topic the concept of maximization and minimization you should understand.

Definition: A multivariable function $z=f(x_1,x_2,...,x_n)$ takes an ordered pair of real numbers $x_1,...,x_n$ and transforms them into a new number.

Example 16: This is a multivariable function $f(x,y)=x^2+y^2$. Thus, $f(1,2)=1^2+2^2=1+4=5$. It has a distinct feature that $f(x,y)=f(y,x)$ but that need not to be true in general.

The nice thing with single variable functions $y=f(x)$ is that you can visually represent then as an x-y graph. The same holds true for a function in two variables. But it is not longer a 2 dimensional curve, it is a 3 dimensional curve. In applied math the x-axis is the depth the z-axis is the height and y-axis is the width. In engineering the z-axis is the depth the y-axis is the height and the x-axis is the width. Again the surface is drawn in the same way, you take many points $(x,y,z)$ plot them and connect them with some surface to get the graph. The problem is you need to be one talented artist to do that. The other problem is that there is no way to graph a function of 3 variables $w=f(x,y,z)$ because that is a 4 dimensional solid (however you would call it).

Example 17: The example before the suface $z=x^2+y^2$ is called a "paraboloid". It is a parabola rotated around its axis.
(Image). As I said unless you are really talented at drawing, you might want to use software. I am sure there are some freeware programs that do 3 dimensional plots. You are probably thinking how does one draw such a monster? The answer is through "traces" meaning we keep one of the variables fixed and graph the 2 dimensional shape that we get. They are called traces because when we keep a variable fixed it is as if we pass a plane through the surface and graph the intersection at the points.

Definition: A multivariable polynomial is just like a single variable polynomial it just has combinations of the variables as a product. In addition, the degree of each term is the sum of the exponents. The degree of a non-constant multivariable polynomial is the largest exponent that we get.

Example 18: $f(x,y)=x^3+x^2y^2+y^3$ is a multivariable polynomial of 2 variables. Furthermore the largest degree is 4, thus the degree is 4.

The following theorem is a classic in Calculus. Before we state it we will give a geometric justification. Graph the curve $y=x^3-2x^2+1$. We see that there are 2 turning points. One is at $x=0$. The other is about $x=1.5$. These two points are of special interest. At $x=0$ we have a relative maximum because on some open interval containing this point we have a maximum value. At $x\approx 1.5$ we have a relative minimum because it is a minimum value on some open interval containing the point. What it means is that you can choose a small enough interval such that the point will be a max/min. Another term that appears is a global extremum (the word extremum refers to max/min point) these are points that are a max/min on the entire function (which is not the case here because it keeps going up and down). The important observation is that slope of the tangent line is zero, and hence the derivative is zero. (There are some exceptions, where derivative does not exists. But that does not concern us because we will assume it is differenciable).

Fermat's Principle: The relative extremum of a function occur at that points where the derivative is zero.

Example 19: The above theorem gives necessary but not sufficient conditions. Meaning relative extremum exists when the derivative is zero. But it does not mean when the derivative is zero we have an extremum! For example $y=x^3$. The extremum is when $y'=3x^2=0$ thus, $x=0$ but look at the curve. There is no extremum.

With that said we can do find the relative extremum for the function $y=x^3-2x^2+1$.
First we take the derivative,
$y'=3x^2-4x$ and solve for zero.
$3x^2-4x=0$
$x(3x-4)=0$
$x=0,x=4/3$

But how do we know if a point is a relative maximum or minimum? There are 2 tests called: first and second derivative tests.
The first derivative test is foolproof but longer, the second derivative test is not foolproof but is much quicker to use.

Look at our function again $y=x^3-2x^2+1$. It is increasing on $0\leq x$ then decreasing on $0\leq x\leq 4/3$ and then again increasing on $4/3 \leq x$. Look at the derivatives at those intervals. When the function is increasing any tangent line you draw has a positive slope and when the function is decreasing it has a negative slope. Thus we have the following useful theorem.

Theorem: When the derivative is positive the function is increasing and when the derivative is negative the function is decreasing.

We can use this in the following way. If a function is increasing to the left of the point and then decreasing to the right of the point, it must be that the point is a relative maximum. And if a function is decreasing to the left of a point and then increasing to the right of the point it must be the point is a relative minimum.

First Derivative Test: The above procedure of looking where the function increase and decreases is the first derivative test.

Example 20: You are probably bored with the function $f(x)=x^3-2x^2+1$ but we found the points where derivative is zero. $x=0,4/3$. Now we divide the number line into 3 intervals: $x<0 , 04/3$. And take any point on this interval and check the derivative sign. Take $x=-1$ then $f'(-1)>0$ thus, it is increasing for $x<0$. Take $x=1$ then $f'(1)<0$ thus, it is decreasing for $0. Take $x=2$ then $f'(2)>0$ thus, it is increasing for $x=4/3$. (You are probably thinking, if one point works on the interval does it mean all are the same sign? Yes! Because if that was not the case then between those two points there must be a point whose derivative is zero, but that cannot be because we found all the points where derivative is zero).
We can make a chart,
$\left\{ \begin{array}{c}x<0 , f'(x)>0, \mbox{ increasing }\\ 00, \mbox{ increasing }\end{array} \right\}$.
Thus, $x=0$ is a relative max and $x=4/3$ is relative min by first derivative test.

Example 21: Returning to the function $f(x)=x^3$ we found that $x=0$ is when derivative is zero. But there is no relative extremum there. If we use the first derivative test we first divide the number line into the intervals $x<0, x>0$. Take $x=-1$ (any point) the derivative $f'(-1)>0$ thus it is increasing on $x<0$. Take $x=1$ the derivative $f'(1)>0$ thus it is increasing on $0. Thus it is increasing and increasing again, that point is nothing.
Because it needs to increase then decrease or decrease then decrease to garuentee a relative extremum point.

Second Derivative Test: If the point $x_0$ is a point where derivative is zero then to determine whether it is relative maximum or minimum we can use: $f''(x_0)<0$ then a maximum and if $f''(x_0)>0$ then minimum.

I am not going to give a good justification for that but we can visualize this. The sign of the second derivative tells how the curve concaves. When $f''(x)>0$ it concaves up and has a $\bigcup$ shape, hence a relative minimum. When $f''(x)<0$ it concaves down and has a $\bigcap$ shape, hence a relative maximum. When $f''(x)=0$ those are inflection points, i.e where the concave up changes to concave down and vice-versa and the test cannot be use. We need to use first derivative test.

Example 22: We found in $f(x)=x^3-2x^2+1$ that $f''(x)=6x-4$ thus, $f''(0)<0$ and hence relative max which agrees with what we found in our previous examples. And $f''(4/3)>0$ and hence relative min which agree with what we found in our previous examples.

These concepts of extremum points and concaves are very useful in drawing curves by hand. For example if you are given $x^3+x^2$ the first thing you do is find the turning point(s) (extremum) and then look at how it concaves. This will quickly present an accurate hand drawing of a curve. This is just a basic idea, there is much more to it which will not be discussed here.

Now we reach, what I believe, is one of the most important features in Calculus, finding maximized and minimized values. The previous discussion that we had about relative max and mins only provide us with relative extremum, not absolute (global) extremum. We come to one of the most celebrated theorems in Calculus/Analysis.

Extreme Value Theorem: Any continous function (no breaks/rips in it) on a closed interval (endpoints included) has extreme values (the absolute max and min).

The proof seems obvious but as you know from experience mathematicians need to prove this type of stuff. The important word is closed if it is open then consider $y=x$ on $(1,2)$ no max no min.
This is an excellent example of an existence theorem a theorem that guarenttes the existence of something but provides no method for finding it. Luckly for us we have a method, the techinque used before.

There are 2 possibilities either the relative max/min is inside the interval (open interval) or on the endpoints of the boundary. If it is inside then Fermat's principle says it is a relative max/min and derivative is zero. The other possibility is that the endpoint contains the absolute max/min. Thus we need to check everything, find the function value and see what produces the largest and smallest numbers.
We do not need to prove the point we found is a relative max or min because we are only interested in maximum/minimum values. Again the derivative being zero does not mean there is an absolute max/min, it is necessay but not sufficient.

Example 23: Consider the function $f(x)=\sqrt[3]{x^2+1}$ on the closed interval $[-1,1]$. Find the value of $x$ that produces the absolute max and min for this function. First, the function is continous (the most basic way to see is check if it is defined on this interval, and it is. In reality we need to show more than just being defined but the standard functions work this way, just check if it is defined or not). Thus we know it has extremum values. The next step is to find $x$ values inside the interval that give derivative of zero. To find the derivative we use chain rule and write $y=u^{1/3}$ and $u=x^2+1$.
Thus,
$\frac{dy}{du}=\frac{1}{3}u^{-2/3}$
$\frac{du}{dx}=2x$
$\frac{dy}{dx}=\frac{2}{3}xu^{-2/3}=\frac{2x}{3\sqrt[3]{(x^2+1)^2}}$
If you solve, $y'=0$ you should see that $x=0$ is the solution (and it is inside the interval).
The functional value is $f(0)=\sqrt[3]{0^2+1}=1$.
Now we check the endpoints,
$f(-1)=\sqrt[3]{(-1)^2+1}=\sqrt[3]{2}$
$f(1)=\sqrt[3]{1^2+1}=\sqrt[3]{2}$.
Thus, the maximum points of this function are the endpoints.
And the minimum point is $x=0$.

Now we get to maximized/minimizing multivariable functions. This is a Calculus III topic but it is extremely similar to this one. Thus, I chose to include it. One thing we will not learn is how to find relative extremum and prove they are relative max/min for a multivariable functions. It is not hard it just some more formulas and theorems.

For simplicity sake the multivariable functions that we will use will be in two variables, that is $z=f(x,y)$. The other thing you should understand that when we deal with single variable functions we work on a closed interval. When we deal with two variable functions we work on a closed region that means a region with the boundary curve included. For example, we might deal with the paraboloid $z=x^2+y^2$ on the closed region $x^2+y^2\leq 1$, this is a disk of radius 1 centered at the origin, meaning we work with all point inside this region. If you downloaded a 3-d graphing program graph this surface along with the region, perhaps it will help if you have difficulty understand what I said.

The thing about multivariable functions there is no such thing as a derivative anymore. There is a different type of derivative that shows to what variable we are differenciating. It is called the partial derivative. For example let $w=f(x,y,z)$. Then,
$\frac{\partial w}{\partial x}$ is the derivative along the $x$ variable. Another way to write it is $w_x$. And sometimes we take the partial derivative twice. In that case, $\frac{\partial^2 w}{\partial x^2}$ is the partial derivative with the $x$ variable twice. Also we can write $w_{xx}$. If we take it with the $x$ variable then the $y$ variable it is, $\frac{\partial^2 w}{\partial x \partial y}$. Note, it need not to be the same as $\frac{\partial^2 w}{\partial y\partial x}$ or in simple (but not cool) notation $w_{xy}$ and $w_{yx}$. But for the standard function that we use it will also be that $w_{xy}=w_{yx}$. But how do we find it?!? What is the definition of it?

Definition: To find a partial derivative of a multivariable function along some variable. We consider all the other variables like constants (as fixed at a number) and differenciate with that variable.

Example 24: Let $z=x^2y+xy^3$. Then to find $z_x$ we consider $y$ to be fixed at a number and take the derivative. Thus, $z_x=2xy+y^3$ and $z_y=x^2+3xy^2$. You should confirm that $z_{xy}=z_{yx}$.

I will pause here and mention what a partial differencial equation is. Before we talked about an ordinary differencial equation and equation of just 1 variable. When we have several variables and a differencial equation involving partial derivatives arise we have a partial differencial equation and these are infinitely more complicated then ordinary differencial equations. Here is an example, a function $y=u(x,y)$ that solves the equation $u_{xx}+u_{yy}=0$ is said to be harmonic and that partial differencial equation is called "LaPlace's Equation", very important.

Here is an analogue of the extreme value theorem.

Theorem: If a function of two variables $z=f(x,y)$ is continous on some closed region then it has extreme values.

Again it seems obvious, but mathemations cannot sleep without a proof.

Another analogue, Fermat's principle is the place where tangent line is zero. Here is when the tangent plane is zero. That is, when the partial derivative are both zero.

Theorem: If a point inside the region is an absolute maximum or minimum then the partial derivatives are zero.

Proof: Say you have $z=f(x,y)$ fix one variable $z=f(x,y_0)$ so just a function of $x$, and we look for max/min when $z'=f_x(x,y)=0$. And fix the other variable $z=f(x_0,y)$ so just a function of $y$, and we look for max/min when $z'=f_y(x_0,y)=0$. Thus we require both. Thus, $z_x=z_y=0$.

The difficulty with function of two variables is that when the max/min is inside the interval means the partials are zero. But just like in the single variable case, we need to check the boundary. But what are we going to do with the boundary, it has infinitely many points, we cannot check all! We express the boundary in terms of one variable and then the problem reduces to dealing with the optimization of a function of two variables.

Example 25: In two variables $x^2+y^2=1$ is a full circle, solve for the positive value, $y=\sqrt{1-x^2}$ and we have a upper semi-circle. Similarly, $x^2+y^2+z^2=1$ is a full sphere, solve $z=\sqrt{1-x^2-y^2}$ and we have the upper hemi-sphere. Let us work in the closed region it forms with xy plane, that is the circular disk $x^2+y^2\leq 1$. Your intuition should tell you the maximum point is the center of the disk, and the minimum points are all the points on the boundary (infinitely many). But let us use the concepts we just said.
$z=\sqrt{1-x^2-y^2}=(1-x^2-y^2)^{1/2}$.
Thus, (skipping the work),
$\frac{\partial z}{\partial x}=\frac{-x}{\sqrt{1-x^2-y^2}}$
$\frac{\partial z}{\partial y}=\frac{-y}{\sqrt{1-x^2-y^2}}$.
The only solutions are when,
$(x,y)=(0,0)$ (which is in the region).
Checking the functional value we find that,
$z=\sqrt{1-0^2-0^2}=1$.
Now we work on the boundary curve (just like in single variable case we deal the endpoints). The boundary curve is the circle $x^2+y^2=1$.
In that case,
$z=\sqrt{1-x^2-y^2}=\sqrt{1-(x^2+y^2)}=\sqrt{1-1}=0$.
Thus, any point on the boundary is always zero.
Hence, the max value we got was at $(0,0)$ which gave value of $z=1$. And the minimum value we got was all the points on the circle with value $z=0$.

Here are some excerises that are way more advanced then the example problem I did.
~~~
Excerises.

1) $y=x^3+x^2$.
Find all relative extremum and classify them as max/min using both the first and second derivative test.

2) $y=\frac{1}{1+x^2}$.
Find all relative extremum and classify them as max/min using both the first and second derivative test.

3) $y=x^3+x^2$ on the closed interval $[-5,5]$. Find the maximum and minimum values.

4) $y=1$ on the closed interval $[-1,1]$. Find the maximum and minimum values.

5)Let $u=x+y$, show that this function is harmonic.

6)Show how the partial derivatives were found in the last example (Hint: Use the Chain Rule, and Power Rule after you express square root as exponent).

7) $z=x+y^2$ in the closed region $x^2+y^2\leq 4$. Find the maximum and minimum values of the function. Gaurenteed by the Extreme Value Theorem.

*8)Consider the triangular region with vertices $(0,0),(1,1),(0,1)$. And the function $z=xy$. Find the maximum and minimum values. Garentueed by the Extreme Value Theorem.

6. Originally Posted by ThePerfectHacker
Fermat's Principle: The relative extremum of a function occur at that points where the derivative is zero.

Example 19: The above theorem gives necessary but not sufficient conditions. Meaning relative extremum exists when the derivative is zero. But it does not mean when the derivative is zero we have an extremum!

(...)

Theorem: If a point inside the region is an absolute maximum or minimum then the partial derivatives are zero.

Proof: Say you have $z=f(x,y)$ fix one variable $z=f(x,y_0)$ so just a function of $x$, and we look for max/min when $z'=f_x(x,y)=0$. And fix the other variable $z=f(x_0,y)$ so just a function of $y$, and we look for max/min when $z'=f_y(x_0,y)=0$. Thus we require both. Thus, $z_x=z_y=0$.
I'd like to add that these are only necessary conditions if the function is differentiable at that point, since extrema can occur where the derivative doesn't exist. Example: y = |x|, no derivative at x = 0 but still an absolute minimum. The same goes for multivariable functions, differentiability is needed for your theorem.

7. Originally Posted by TD!
I'd like to add that these are only necessary conditions if the function is differentiable at that point, since extrema can occur where the derivative doesn't exist. Example: y = |x|, no derivative at x = 0 but still an absolute minimum. The same goes for multivariable functions, differentiability is needed for your theorem.
If you read my full post I make mention of that problem.
(It is an Introduction, not some real analysis course, thus, I chose to ignore it).

8. I missed "There are some exceptions, where derivative does not exists. But that does not concern us because we will assume it is differenciable.", but I think that y = |x| is a classical example, not really "high level".
I understand that you're trying to keep it basic, but when you propose theorems (even with "proofs"), then I think you can agree that they should at least be correct - I was just pointing out that adding "if f is differentiable at a..." would do the trick. Other than that, no comments; didn't mean to nitpick

9. Originally Posted by TD!
I missed "There are some exceptions, where derivative does not exists. But that does not concern us because we will assume it is differenciable.", but I think that y = |x| is a classical example, not really "high level".
I understand that you're trying to keep it basic, but when you propose theorems (even with "proofs"), then I think you can agree that they should at least be correct - I was just pointing out that adding "if f is differentiable at a..." would do the trick. Other than that, no comments; didn't mean to nitpick
I did not want to do that. I rather mention that the function we will use are differenciable, thus we do not need to worry.

10. That's fine as well and perhaps even more accessible, but then I'd effectively mention that somewhere in the beginning (that you're working with differentiable, or "smooth" functions, or to put it more intoduction-ish: to work with "nice functions which behave" )

11. Here are some applications for which optimization problems can be used for. First, most things in the universe either come as a max/min. I remember my professor asked a question to the class. We were working on a beam problem and he asked all of these equations work thus all of these are the possible outcomes for the shape of curvature for the beam, how do we know it looks like this if all of these work. I immediately raised my hand (the only one, for some reason in college nobody ever answers anything) and said it is minimized, and he said correct, but why? My purpose of this telling is to mention what I said, that most things in the universe always come as a max/min problems, thus it was my experience that told me to say that. For example, a planet is in a shape of a ball (sphere) and the sphere is the maximum volume with a given surface area, or you can think of it as the minimum surface area with a given volume. Another useful thing about max/min solutions is that if it works for the worst case senario then it works for all cases, thus we can sometimes just look at one situation (worst case) show that it works and conclude that it always works, instead of getting each possible case.

Shortest Distance:
The distance from a point to a line (assuming it is not on it) is the shortest distance from the point to the line, because distance is measured in minimum value. To find that distance we draw a perpendicular line (which is minimized) and find its distance. The standard way is to do this: 1)Given the equation of line and a point, 2)Find equation of perpendicular line passing through the point, 3)Find the intersection between the line and its perpendicular, 4)Find the distance from the intersection point to that point. This is an algebraic nightmare. I have indeed down out a full solution (it took me 2 hours) and posted it on the site, but the page is not loading.
We will use the concept of minimums/maximums to solve this problem.
If we are given a horizontal line and a point the solution is trival (obvious).
Thus, it is safe to assume we have a non-horizontal line, $y=mx+b$ and a point $(x_0,y_0)$ not contained on the line (otherwise the distance is zero). What point on the line $(x,y)$ minimizes the distance to the point $(x_0,y_0)$ is what we are looking for. Thus, let the point on the line be $(x,y)=(x,mx+b)$ then the distance to $(x_0,y_0)$ is,
$s=\sqrt{(x-x_0)^2+(mx+b-y_0)^2}$.
The important observation is that if $s$ is minimized then $s^2$ is too minimized (if the distance is as small as can possibly be then the square of the distance is the small as it can possibly be).
Thus, we have a function depending on $x$,
$s=(x-x_0)^2+(mx+b-y_0)^2$.
Now we find the derivative and make it zero (you are probably thinking how do we know if that is the minimum value? Maybe it is the maximum value, as we have seen. Simple, it cannot possibly be the maximum value because we could move the point as far as we want).
Thus, chain rule,
$s'=2(x-x_0)+2m(mx+b-y_0)$
$s'=2x-2x_0+2m^2x+2mb-2my_0=0$
$x(1+m^2)=x_0+my_0-mb$
Thus, $(1+m^2)\not = 0$,
$x=\frac{x_0+my_0-mb}{1+m^2}$
That is the intersection point between the line and the perpendicular.
Note, the following will involve some algebra manipulation but it is not nearly as bad without derivatives what we did in the beginning.
Now we find the distance between that point and the given point.
Note that, (details omitted),
$x-x_0=\frac{x_0+my_0-mb}{1+m^2}-x_0=\frac{m(y_0-b-mx_0)}{1+m^2}$
And that, (details omitted),
$mx+b-y_0=\frac{mx_0+m^2y-m^2b}{1+m^2}+b-y_0=\frac{mx_0+b-y_0}{1+m^2}$.
The the distance $l$ between the points is,
$l^2=(x-x_0)^2+(mx+b-y_0)^2$
$l^2= \frac{m^2(y_0-b-mx_0)^2}{(1+m^2)^2}+\frac{(mx_0+b-y_0)}{(1+m^2)^2}$
Note,
$(-x)^2=x^2$
Thus,
$(mx+b-y_0)^2=(y_0-b-mx_0)^2$
Thus, (factoring the common expression)
$l^2=\frac{(mx_0+b-y_0)^2(1+m^2)}{(1+m^2)^2}$
Thus, (canceling)
$l^2=\frac{(mx_0+b-y_0)^2}{1+m^2}$.
Take square root and note, $\sqrt{x^2}=|x|$,
$l=\frac{|mx_0+b-y_0|}{\sqrt{1+m^2}}$.
But, I am going to make this formula even better!

There are two ways of expressing the equation of a line: slope intercept form and standard form,
$y=mx+b$---> Slope-Intercept Form.
$Ax+By+C=0$---> Standard-Form.
Now,
Standard > Slope-Intercept
Because in slope-intercept we can only express non-vertical lines.
In Standard Form we can express any line, both vertical and non-vertical.
To prove this assume we have a line. Divide the prove into two cases: A vertical line and a non-vertical line. If it is the first case then, $x=k$ thus, $1\cdot x+0\cdot y-k=0$.
If it is non-vertical then we can write, $y=mx+b$, thus, $mx-1\cdot y+b=0$.

Returning back to the "shortest distance" problem. If we have a non-vertical line in the form $Ax+By+C=0$ then $B\not = 0$ (non-vertical) thus,
$(A/B)x+y+(C/B)=0$
$y=-(A/B)x-(C/B)$
And some point not on the line $(x_0,y_0)$.
The formula says,
$l=\frac{|-(A/B)x_0-(C/B)-y_0|}{\sqrt{1+(-A/B)^2}}$
Thus,
$l=\frac{|(A/B)x_0+(C/B)+y_0|}{\sqrt{1+A^2/B^2}}$
Thus,
$l=\frac{|Ax_0+By_0+C|}{\sqrt{A^2+B^2}}$.
Look how easy the formula looks! All you do is substitute the point into the line in Standard-Form and divide by the Pythagorean distance! So easy to remember! The other nice feature is that if the line is vertical then for certainly it works because in the vertical case you need to find the distance between the x-coordinates. And it also works when the point is on the line. Because if it is then by substituting you get a zero in the numerator. Thus, we can state this result as follows,

Theorem: Given ANY point $(x_0,y_0)$ and ANY line $Ax+By+C=0$ then the distance between them is,
$\boxed{ \frac{|Ax_0+By_0+C|}{\sqrt{A^2+B^2}}}$.

I would like to mention that when you learn 3-D plots the analogy of a line in 2 dimensions is a plane in 3 dimensions. And there is a result that states that is you have a point $(x_0,y_0,z_0)$ and a plane $Ax+By+Cz+D=0$ then the distance is,
$\frac{|Ax_0+By_0+Cz_0+D|}{\sqrt{A^2+B^2+C^2}}$

Example 26: Given a line $3x+4y=0$ and a point $(1,2)$ then the distance is,
$\frac{|3(1)+4(2)|}{\sqrt{3^2+4^2}}=\frac{11}{5}$

Method of Least Squares
In statistics a common problem after an experiment is done, a set of points is collected and visually represented as an x-y plot.
The problem is to approximate these set of points as a curve, in our example a straight line. But the difficulty is that these point do not necessarily lie on a staight line, thus we need to approximate the best possible line. The question that you should ask is what does it mean "best". The following concept and solution was devised by Gauss and independently and inadvertenly two weeks later by his nemesis, Legendre. Thus, some texts write Gauss discovered it while others write Legendre discovered it. The following explanation is my own, which I never seen elsewhere, I like it because it is more detailed thus suggesting what went through the minds of Gauss/Legendre.
Assume, we have a set of points and we guess what the the best fitting line is visually, and we draw it. Some error is created. The error is the difference between the actual value at the point and the approximated value at the line.
Below is an example, the set of points where $\{(1,1.5),(2,1.75),(3,3),(4,5),(5,4)\}$. I approximated by $y=x$.
The black lines (the vertical distances) represent the error for each point. The total error respectively is:
$.5+.25+0+1+1=2.75$
Note, errors are measured always in positive.
The dotted line is the "line of best fit" that I drew, the equation is $y=.825x+.575$.
Note, though it does not contain any point the total error (sum of errors) is less then the full red line.
This is a very reasonable way to define such a line.

Definition: The line of best fit for a set of points is a line such that the total error of the distances is minimized.

Note, the word minimized, this suggests this is going to be a Calculus optimization problem.
Given a set of points $\{(x_1,y_1),(x_2,y_2),...,(x_n,y_n)\}$ and we are trying to minimize the total error with the line $y=ax+b$.
The total error is,
$|ax_1+b-y_1|+|ax_2+b-y_2|+...+|ax_n+b-y_n|=\sum_{k=1}^n |ax_k+b-y_k|$.
For simplicity sake we will find the line of best fite that passes through origin meaning $b=0$.
Thus, $f(a)=\sum_{k=1}^n |ax_k-y_k|$
To minimize this function we need $f'(a)=0$.
The problem is the absolute value, it is too messy, we never discussed the derivative of $y=|x|$ and it happens to not exists at $x=0$, thus we cannot simply take the derivative. What do we do? We reason like this, if the errors are minimized (the vertical distances) then for certainly their squares (the vertical distances squared) are minimized. Thus, we need to minimize,
$f(a)=\sum_{k=1}^n (|ax_k-y_k|)^2$.
But, $(|n|)^2=n^2$.
(Note, the reason why we used the square, it removes the absolute value, we could have also used any even exponent, but why use it and work with higher exponents!)
Thus, we need to minimize,
$f(a)=\sum_{k=1}^n (ax_k-y_k)^2$.
If we find $f'(a)=0$ it is either max error value (line of worst fit) or min error value (line of best fit). It cannot be line of worst fit because we can choose a line a far away as we like. Thus the derivative equal to zero is going to be the line of best fit.
Thus, (chain rule)
$f'(a)=2a\sum_{k=1}^n (ax_k-y_k)=0$
Divide by $2a$ (note $a\not = 0$ because that is a vertical line and a non-interseting case. Meaning we can certainly tell by looking at the points if they are going to be vertical or not).
Thus,
$\sum_{k=1}^n (ax_k-y_k)=a(x_1+...+x_n)-(y_1+...+y_n)=0$
Thus, (We are working in 1st Quadrant sum of x-coordinates must be positive and hence non-zero).
$a=\frac{y_1+...+y_n}{x_1+...+x_n}$.
If you want to make it look more elegant you can divide through by $n$. And have the average in numberator and denominator,
$a=\frac{\bar x}{\bar y}$
The "bar" on top represent the average of x,y coordinates.

Funny, I just realized that I could have developed this formula without any Calculus! See if you can figure it out (Possible Problem of the Week).

Example 27: The problem in the diagram shown below. The line of best fit (through the origin) has slope,
$\frac{1.5+1.75+3+5+4}{1+2+3+4+5}=1.016$. Thus, it is almost the line I drew but not exactly.
Note, it is still not the line of best fit because we purposely neglected the constant term in the line equation.

It is my hope that I will show you how to approximate a curve with another curve. Over here we had a set of finite points when you approximate a curve with another curve you have an infinite set of points, there is a different techiqnue that is used.
~~~
Excercises

1)Find the distance between the parabola $y=x^2$ and the point $(3,3)$.

*2)Develope the full method of least squares for a general line $y=ax+b$. (Hint: the exact same procedure. You only need to take the partial derivatives this time and make them zero).

3)In the picture below use #2 and find the line of best fit.

4)A length of striaght wire is $L$ units long. You take a pair of rusty scissors and cut the wire somewhere, you take the first half and turn it into a square, you take the second half and turn it into a circle. Where should you cut (assuming you want to) to maximize and minimize the areas of the circle and square together?

5)Drawn point A on paper. Drawn point B directly below it. Draw point C directly to the right of point B. Draw two vertical lines through point A and B. The distance between the two horizontal lines is 10 miles and the distance between B and C is 20 miles. You are a crazy ATV driver and want to get to point C as quickly as possible while driving blindfolded and upside down. You start at point A. The region in between the horizontal lines is a desert and your top speed is 20 miles/hour. The line between B and C is the highway and you can travel 40 miles/hour. What path (in straight lines) should you take the minimize your driving time?

12. Calculus itself is divided into three parts in Universities. The first two parts are divided into: differencial calculus and integral calculus. Differencial Calculus is more commonly called Calculus I and Integral Calculus is more commonly called Calculus II. As you expect Calculus I concentrates on the derivative, which is what we have been doing thus far in the lectures. Since I think you have sufficent understanding of what a derivative is and how it is used in math and applied math we can talk about the other important part of Calculus, the integral. The integral is related to the "anti-derivative" that means a function whose derivative is back the function. Think of it as an inverse operator on a function, like the square root is the opposite of the squaring function. But there is one problem. Say we want to solve $y'=2x$ meaning a function whose derivative is $2x$ if we think about it we note that $y=x^2$, but wait any constant added disappears (similar to our discussion of free falling bodies equations). This is contained in the following definition and theorem.

Definition: The anti-derivative of a function $y=f(x)$ is expressed as $y=\int f(x) dx$. It represent all the functions whose derivatives are $f(x)$. Thus, the integral is a set of functions that has the property that if $g(x)\in \int f(x) dx$ (an element of this set) then $g'(x)=f(x)$. And if $g(x)$ is a function such that $g'(x)=f(x)$ then $g(x)\in \int f(x) dx$. This symbol (streched S) is called "indefinite integral".

I want to make the comment, note $dx$ appears in the end. I myself an not in favor of putting that, but since that is the standard notation I do. It is unnecessary unless in a case where you are taking a multiple integral (which we will not discuss) and these "differencials" show which order to take the integral in. In a single variable case, I really do not see any purpose (except possibly for one).

Theorem: Given a function $y=f(x)$ and $g(x)$ is a function such that $g'(x)=f(x)$ then,
$\int f(x) dx = g(x)+C$ where $C$ is any constant function on the open interval where we are differenciating.

Basically, this is saying if we can find one function that is an anti-derivative then all functions (the indefinite integral) is just some constant sum of that. It should seem understandable but we will not be able to completely prove it.

Proof: If $y=f(x)$ and $g'(x)=f(x)$ and let $h(x)$ be another anti-derivative $h'(x)=f(x)$. Then, $g'(x)-h'(x)=0$, property of derivative says $(g(x)-h(x))'=0$. Now the only function whose derivative is always zero is a constant function. Think of it this way, the tangent line is always horizontal, which is a horizontal line, hence a constant. Thus, $g(x)-h(x)=C$ thus, $g(x)=h(x)+C$. Meaning it can be expressed as a constant added to the original function. (The actual proof is too advanced for us and relies on the most important theorem in Calculus, Mean Value Theorem).

Linearity of Integral: Since $(cy)'=cy'$ and $(y_1+y_2)'=y_1'+y_2'$ (remember I said pay attention to these properties they appear many times in math). The integral also has these properties. I leave that to you to prove.
$\int f(x) +g(x) dx=\int f(x) dx+\int g(x) dx$
$\int k f(x) dx= k\int f(x) dx$.

Example 28: Let $f(x)=2x$ then $\int 2x dx$ is found by finding an anti-derivative, for example $x^2$ then, $\int 2x dx=x^2+C$. Note, we could have chose $x^2+1$ as our anti-derivative. That would mean that $\int 2x dx=x^2+1+C$ but in reality they are the same. Because in the second case by choosing a constant 1 less will result the same as in the second case. Thus, these set of functions are equal.

The following theorem should seem simple.

Power Rule: If $f(x)=x^n$ and $n\not = -1$ then $\int x^n dx=\frac{x^{n+1}}{n+1}$.

Proof: Nothing to it, if $n\not = -1$ then we can define a function $y=\frac{x^{n+1}}{n+1}$.
Thus, $y'=x^n$ by the power rule for derivatives. By our theorem since we found an anti-derivative all anti-derivatives differ by a constant. Thus, $\int x^n dx=\frac{x^{n+1}}{n+1}+C$.
The case $n=-1$ will be covered later.

I asked around in College what people found more difficult Calculus II or Calculus III more said Calculus II because rules for integrations are developed. Unlike derivatives where any known functions can be integrated. Ihe integral is much more complicated. Sometimes you cannot even find it! And there are many many rules how to deal with each case. Also unlike basic algebra where you understand how it works you do not need to memorize anything, over here some memorization is required because some of these derivations are clever and probably will not be found by just looking at them. I am not going to go through many different types of integrals. Just three very useful rules.

There are two functions that are fundamental in Calculus/Analysis, the exponential and logarithmic functions. I am going to show you an ugly way of deriving the main results of these functions, this is not a standard approach but I think it is important to at least have some idea where they come from.

Definition: Define the number $e=\lim_{n\to \infty} (1+1/n)^n=2.718...$.

Of course, the main problem is that we need to show that this sequence converges to some number which we define as $e$. One way of doing this, is by using the famous theorem in Analysis, Weierstrauss-Bolzano theorem, which seems obvious, if a strictly increasing sequence is bounded (always below some number) then it converges. Again this is an existence theorem that assures us that the sequence converges but does not provide us to what. Thus, we need to show that $a_{n+1}>a_n$ where $a_n=(1+1/n)^n$ and also show that $a_n<3$, then by Bolzano-Weierstrauss theorem such a number exists. But I am not going to do that derivation.

Definition: We can define an exponential function $y=e^x$ for the entire number line because $e>0$. All it is an exponent function like $y=2^x$ only with a different base.

If we graph this function it only exists in the area above the x-axis. Hence the range of $e^x$ are all positive numbers.

Definition: An inverse function (if it exists) is a function that undoes the original function. Meaning $f(x)$ is a given invertible function. The inverse is denotated by $f^{-1}(x)$ (and it does not mean $1/f(x)$) such that $f(f^{-1}(x))=x$ and $f^{-1}(f(x))=x$.

Example 29: The function $y=x^2$ does not have an inverse, however if we restrict the domain to $x\geq 0$ then the half-parabola does have an inverse, namely the square root function $y=\sqrt{x}$.

A way to determine if the inverse exists is by passing a horizontal line and seeing that if intersects the function once of not. This is not true with the parabola in the example above because for some horizontal lines it pases twice. But by restricting the domain to the positives the half parabola satisfies the conditions. One way to show an inverse exists on an interval is by showing the function is continous and strictly increasing or decreasing (derivative is always one sign) that would assure of a horizontal line passing once. The graph of the exponential function $y=e^x$ is increasing and hence any horizontal line drawn intersects (if it does) only exactly once. Thus the inverse function is called the natural logarithmic function $y=\ln x$.

Definition: The natural logarithm function $y=\ln x$ is defined for all positives values and it is the inverse of the natural exponential. The value of it means what does $e$ have to be raised to , to result in $x$? Thus, $\ln e =1$ because $e^1=e$.

If the domain of an invertible function $f$ is $D$ and the range is $R$ then the domain for $f^{-1}$ is $R$ and domain is $D$. Thus the natural logarithm is defined for the range of $e^x$ which are the positives and the range is the domain of $e^x$ which is any value.

Both the exponential and logarithmic functions have important properties.

Properties:
$e^{x+y}=e^xe^y$
$(e^x)^y=e^{xy}$
$e^{-x}=\frac{1}{e^x}$
$e^0=1$
For $x,y>0$
$\ln (xy)=\ln x+\ln y$
$\ln (x/y)=\ln x-\ln y$
$\ln x^n = n\ln x$
$\ln e=1$
$\ln 1 =0$
$e^{\ln x}=x$
$\ln e^x=x$

Now we get to the derivative of the exponential function and the logarithm.

Theorem: The derivative of $y=e^x$ is $y'=e^x$.

Proof: This is not really a proof but it should give some reason. We know that $e=(1+1/n)^n$ as $n\to \infty$. Thus, $1/n\to 0$. We can therefore write, $e=\lim_{\Delta x\to 0}(1+\Delta x)^{1/\Delta x}$. Thus, for very small $\Delta x$ we have $e\approx (1+\Delta x)^{1/\Delta x}$ thus $e^{\Delta x}\approx 1+\Delta x$. Thus, $e^{\Delta x}-1\approx \Delta x$. Thus,
$\frac{e^{\Delta x}-1}{\Delta x}\approx 1$.
Thus, the smaller the number the closer the value,
$\lim_{\Delta x\to 0} \frac{e^{\Delta x}-1}{\Delta x}=1$.
Now we use the limit definition for derivative on $y=e^x$,
$\lim_{\Delta x\to 0} \frac{e^{x+\Delta x}-e^x}{\Delta x}$
$\lim_{\Delta x\to 0} e^x \cdot \frac{e^{\Delta x}-1}{\Delta x}$
Using the above statement,
$e^x(1)=e^x$.
(The following derivation is not mine, I stole it from my Calculus book).

Corollary: The integral $\int e^x dx=e^x+C$.

Theorem:: The derivative for $y=\ln x, x>0$ is $y=\frac{1}{x}$.

Proof:: We can write $e^{\ln x}=x$ and take the derivative of both sides, the right hand side is trivial and the derivative is $1$. The left hand side we use the chain rule, $y=e^u$ where $u=\ln x$.
Thus,
$\frac{dy}{du}=e^u$
$\frac{du}{dx}=u'$
Thus,
$\frac{dy}{dx}=u'e^u=u'e^{\ln x}=u'x$.
Thus, left hand equal to right hand,
$u'x=1$
$(\ln x)'=u'=\frac{1}{x}$.
Note, the error in the proof is that I never shown that $\ln x$ has a derivative in that case the proof fails. But there is a useful theorem that assures us the inverse function does have the derivative if the original function has.

Sometimes it is useful to consider the following derivative.

Theorem: The derivative for $y=\ln |x|, x\not = 0$ is $y'=1/x$.

Proof: The difference between this and the derivative I just stated above is the domain of function. In the first case $1/x$ was the right part of the hyperbola, while this derivative is both parts of the hyperbola. This is because of the absolute vaue it clears signs and hence we have both parts.

Corollary: The integral $\int \frac{1}{x} dx = \ln |x|+C$

Note the fundamental property of the natural exponential is that it is itself, of course zero also works but that is uninteresting. Thus the exponential satisfies the differencial equation $y'=y$. The interesting property about the logarithmic function is that it is a "transcendental function" meaning cannot be expressed $+,-,\cdot ,/ , \sqrt[n]{\,\,\,}$ (funny thing is that I do not think it is proven). While an algebraic functions, like polynomials, rationals, can be expressed as those operations. And we have the derivative of a transcendental function an algebraic function!

We can extend the power rule for integrals.

Extended Power Rule: The integral $\int x^n dx = \left\{ \begin{array}{c} \frac{x^{n+1}}{n+1}, n\not = -1 \\ \ln |x|, n = -1 \end{array} \right\}+C$

Up to this point I have explained two very important functions. We will say that many anti-derivatives have them. Right now I will concentrate on three powerful techiniques of finding anti-derivatives. Throughout this I will be using my style of the substitution rule, that I never ever seen anybody do. Because mine is formal (mathematical) while the standard techinque is not and hence it wants to make me vomit. And mine is better because I use it.

Example 30: Assume we need to find $\int e^x +x + 1/x dx$. The integration is a linear operator (meaning we can do each one). Thus by the extended by power rule and exponential function we have,
$e^x+\frac{1}{2}x^2+\ln |x|+C$.

Substitution Rule
In Calculus it is standard to represent the anti-derivative of a function $f(x)$ in capitals $F(x)$. Let us assume we are given a standard function $f(x)$ that has an anti-derivative $F(x)$, that means, $F'(x)=f(x)$. Let $g(x)$ be some other function which we can take a derivative of. Then by the chain rule,
$[F(g(x))]'=g'(x)F'(g(x))=g'(x)f(g(x))$.
That means that $F(g(x))$ is an anti-derivative of $g'(x)f(g(x))$.
Thus, by the results we developed above,
$F(g(x))+C=\int f(g(x))g'(x) dx$.
Basically, what this is what the theorem says.
1)We want to find $\int h(x) dx$ for some function $h(x)$.
2)If we can express $h(x)=f(g(x))g'(x)$ for some other functions $f(x),g(x)$.
3)Then we need to find $F(x)$ an anti-derivative of $f(x)$.
4)Then the integral of $h(x)$ is the compostion $F(h(x))$.
This theorem is the reverse of the chain rule.

I will do an example using the official way above and then do it using my way, because it will be easier for you to follow.

Example 31: We need to find $\int (1+x)^5 dx$. If instead we had $\int x^5$ then everything is easy. Thus, the inner function $g(x)=x+1$ and the outer function is $f(x)=x^5$ but we also have that $g'(x)=1$ which surly appears in the intergal because,
$\int x^5 dx=\int x^5 (1)dx$. Hence it has the form mentioned above. $f(g(x))g'(x)=(x+1)^5$. The next step is to find an anti-derivative of the outer function $f(x)=x^5$ which is $F(x)=\frac{1}{6}x^6$. Thus the answer is,
$F(g(x))+C=\frac{1}{6}(x+1)^6+C$.
Because it is an anti-derivative. If you take the derivative you will get back the original function. A useful way to check.

Hacker's Subsitution Rule
We know that,
$\int f(g(x))g'(x)dx$.
For simplicity call $u=g(x)$ then we have,
$\int f(u) u' dx$.
Where $f(u)$ is some expression of $u$.
Then we need to find the anti-derivative of the outer function.
That means,
$\int f(u) du$.
Thus, stated another way,
$\int f(u) \frac{du}{dx} dx=\int f(u) du=F(u)+C=F(g(x))+C$
(The mnemonic is that it is as if we can cancel the $dx$'s. But we cannot, that is not a fraction.)

Example 32: Now we do it my simplified way. The idea is as follows. We call the inside function $u$ then then we immediately find its derivative $u'$ and make it appear in the product. Thus we have $\int (1+x)^5 dx$. We see that if we call $u=x+1$ we reduce the problem to a basic exponent, what we want. But as I said, we immediately find the derivative $u'=1$. Thus,
$\int u^5 (1) dx = \int u^5 u' dx =\int u^5 du = \frac{1}{6}u^6 +C=\frac{1}{6}(1+x)^6 +C$.

Subsitution rule is most important. You need to get good with it. We need much, much more examples.

Example 33: We will find $\int (2x+1)^3 dx$. We see it is reasonable to define the inner function as $u=2x+1$. Now we immediately find its derivative $u'=2$. But there is no multiple of 2 in the integrand! Does it mean the rule fails? No. Remember we factor in and out a constant function. Thus we introduce the 2 multiplier.
Thus,
$\int (2x+1)^3 dx= \frac{1}{2} \int (2x+1)^3 (2) dx= \frac{1}{2} \int u^3 u' dx=\frac{1}{2} \int u^4 du$.
The integral is simple to find through power rule,
$\frac{1}{3} \frac{1}{5} u^5+C=\frac{1}{15} (2x+1)^5+C$.

Example 34: We will find $\int xe^{x^2} dx$. This is tricker but let us call $u=x^2$. Now we immediately find its derivative $u'=2x$. Look! It is almost in the special form except of the 2 multiplier. Thus,
$\int xe^{x^2}dx=\frac{1}{2}\int e^{x^2}(2x)dx=\frac{1}{2} \int e^u u' dx=\frac{1}{2} \int e^u du$.
The integral is trivial.
$\frac{1}{2}e^u+C=\frac{1}{2}e^{x^2}+C$.
I like to mention, that is you were paying attention you would have said,
$\frac{1}{2} \left( e^u+C \right) =\frac{1}{2}e^u+\frac{1}{2}C$.
But that does not matter because $C$ is a constant thus $1/2C$ is also, thus we renamed the constant.
This skipping step is typical, get used to it.

Example 35: We will find $\int \frac{2x}{1+x^2} dx$. Knowing what function to call the inside function takes a trained eye. But we can do, $u=1+x^2$. Now we immediately find the derivative $u'=2x$. Thus,
$\int \frac{2x}{1+x^2}dx=\int \frac{1}{1+x^2}(2x)dx=\int \frac{1}{u} u' dx=\int \frac{1}{u} du$
The integral is simple as standard one,
$\ln |u|+C=\ln |1+x^2|+C=\ln (1+x^2)+C$.
(Because $1+x^2\geq 0$

Example 36: Here is a much tougher integral $\int \frac{1}{1+e^x} dx$. Let us call $u=1+e^x$. Then we immediately find the derivative $u'=e^x$. Thus we need to make this derivative appear as a multiplier in the integrand. Thus we will multiply the numerator and denominator by the exponential,
$\int \frac{1}{1+e^x} dx=\int \frac{e^x}{e^x(1+e^x)} dx=\int \frac{1}{e^x(1+e^x)}\cdot e^x dx$.
Since $u=1+e^x$ thus $u-1=e^x$.
Thus,
$\int \frac{1}{(u-1)u} u' dx=\int \frac{1}{u(u-1)} du$
We did not talk about this yet, by the technique is to express the fraction as a sum,
$\int \frac{1}{u}-\frac{1}{u-1} du=\ln |u|-\ln|u-1| +C$ (Excerise).
$\ln |1+e^x|-\ln |e^x|+C = \ln (1+e^x)-\ln e^x+C=\ln \frac{1+e^x}{e^x} +C$
Thus,
$\ln (1+e^{-x})+C$
Is the integral. You can confirm by taking the derivative and getting back the original function.

I hope you get some idea of integral by substitution, it is the most powerful techique.

Integration by Parts
When we have a product of two functions the derivative is easy to find. The integral not thus. The technique is to simply a product into a simpler product whenever possible.

Theorem: Let $u,v$ be functions then, $\int u'v dx+\int uv' dx= uv+C$.

Proof: Just as the substitution rule is the inverse of chain rule, this is the inverse of the product rule. We know that $(uv)'=u'v+uv'$. Integrate both sides and note that $\int (uv)'dx =uv+C$ because the integral is the opposite operation of derivative. Thus, $\int u'v dx+\int uv' dx=uv+C$.

Corollary: If we move the integral to the other side we have,
$\int uv' dx = uv - \int u'v dx$.
Note the constant does not matter it will come from the integral itself thus nothing is lost.

Example 37: Assume we need to integrate $\int xe^x dx$. We will use the formula directly above. Let $u=x$ and $v'=e^x$. Thus, $u'=1$ and $v=e^x$ (constant does not matter it will still appear after the integral). Thus, by parts, we have,
$uv-\int u' vdx = xe^x-\int e^x dx=xe^x-e^x+C$.

In general when you have $xh(x)$ you may want to call $u=x$ and $v'=h(x)$. Because when you take the derivative the function $u'=1$. Simply a constant. In fact if you have $P(x)h(x)$ where $P(x)$ is a polynomial by calling this function $u=P(x)$ you will reduce the degree by 1 and apply the theorem again and again.

Example 38: Let us have $\int x^2e^x dx$. Call $u=x^2$ and $v'=e^x$ thus $u'=2x$ and $v=e^x$. Thus,
$x^2e^x-\int 2x e^x=x^2-2\int xe^x dx$. Now you can do integration by parts again. In the example before we did that and have, $x^2 -2 (xe^x - e^x)=x^2-2x e^x+2e^x+C$. Again you can avoid the constant function till the end.

Example 39: Let us find $\int \ln x dx$. We do the following, we can write,
$\int \ln x dx= \int (1) \ln x dx$
And we call,
$u=\ln x$ and $v'=1$.
Thus,
$u'=1/x$ and $v=x$
Thus, integration by parts,
$x\ln x - \int (1/x)x dx=x\ln x-\int 1 dx=x\ln x-x+C$.

Partial Fractions Decomposition
The idea I believe first appeared by Bernoulli. It is a techinique for expressing a rational function as a simpler sum. This is not only useful in integration it is useful in other parts in math. For example, a very useful way to solve certain types of differencial equations is with a Laplace Transform, but to use them you need to know partial fractions.

General Theory: There are several cases, they should seem clear.
1)A rational function is a quotient of two polynomial functions. If the numerator's degree exceeds the denominators then you need to use long division to obtain a fraction with the numerator having a smaller degree then denominator.
2)A rational function which satisfies #1 (smaller degree on top) of the form:
$\frac{P(x)}{(x-x_1)(x-x_2)...(x-x_n)}$
Can be expressed as,
$\frac{A_1}{x-x_1}+\frac{A_2}{x-x_2}+...+\frac{A_n}{x-x_n}$
Where $A_1,...,A_n$ are constants that need to be determined.
3)A rational function which satisfies #1 of the form:
$\frac{P(x)}{(x-x_0)^n}$
Can be expressed as,
$\frac{A_1}{x-x_0}+\frac{A_2}{(x-x_0)^2}+...+\frac{A_n}{(x-x_0)^n}$
Where $A_1,...,A_n$ are constants to be determined.
4)A rational function which satisfies #1 of the form:
$\frac{P(x)}{(x^2+A_1x+B_1)...(x^2+A_nx+B_n)}$
Can be expressed as,
$\frac{C_1x+D_1}{x^2+A_1x+B_1}+...+\frac{C_nx+D_n}{ x^2+A_nx+B_n}$
Where $C_1,...,C_n,D_1,...,D_n$ are constants to be determined.
5)Any function satisfying #1 is a sum of all of those.
(Note there cannot be a function with a cubic denominator because by the Fundamental Theorem of Algebra "Any non-constant polynomial can be factored uniquely up to a multiplicative constant into linear and quadradic factors". Thus anything can be expressed in terms of quadradics and linear polynomials).

The only difficutly now is to find the constants.

Example 40: Consider $\frac{2x^2+1}{(x^2+1)(x-1)^2(x-2)}$. We will only set us the form. The theory assures us that we can find constants such that (note smaller degree on top),
$\frac{2x^2+1}{(x^2+1)(x-1)^2(x-2)}=\frac{Ax+B}{x^2+1}+\frac{C}{x-1}+\frac{D}{(x-1)^2}+\frac{E}{x-2}$
Note, by #5 we have 3 distinct cases: $(x^2+1)$ case # 4, $(x-1)^2$ case # 3, $(x-2)$ case #2.
Thus, we have the sum of those.

Example 41: We will decompose $\frac{x+1}{(x-2)^2}$. It fits #1 thus the theory says:
$\frac{x+1}{(x-2)^2}=\frac{A}{x-2}+\frac{B}{(x-2)^2}$
Now you multiply by the denominator of the original function,
$x+1=A(x-2)+B$
$x+1=Ax-2A+B$
$x+1=Ax+(-2A+B)$
Thus, $-2A=1$ meaning $A=-1/2$.
Thus, $-2A+B=1$ means $-2(-1/2)+B=1$ thus $B=2$.
The decomposition is,
$\frac{x+1}{(x-2)^2}=\frac{-1/2}{x-1}+\frac{2}{(x-1)^2}$

Example 42: We will decompose $\frac{1}{x^2-1}$. It fits #1 thus the theory says:
$\frac{1}{(x-1)(x+1)}=\frac{A}{x-1}+\frac{B}{x+1}$.
(Note we can do #4, but why?).
Multiply by original denominator to clear fractions,
$1=A(x+1)+B(x-1)$
$1=Ax+A+Bx-B$
$1=x(A+B)+(A-B)$
We have a linear system,
$A+B=0$
$A-B=1$
Thus,
$A=1/2, B=-1/2$
The decompostion is thus,
$\frac{1}{x^2-1}=\frac{1/2}{x-1}+\frac{-1/2}{x+1}$
Check and see.

Example 43: Thus, the integral $\int \frac{1}{x^2-1} dx$ can be simplified,
$\int \frac{1/2}{x-1} dx+\int \frac{-1/2}{x+1} dx$
I leave it to you do use verify the substitution rule,
$\frac{1}{2}\ln |x-1|+\frac{1}{2}\ln |x+1|+C=\frac{1}{2}\ln \left| \frac{x-1}{x+1} \right|+C$

I like to mention some functions has no anti-derivatives in the standard functions. The branch of math that works on these problem is called differencial algebra, it reminds of the fact that no formula exists for quintics. It is in fact can be shown that no such function can exist. The most classic example, is the useful function in probability theory,
$\int e^{x^2}dx$.
It cannot be do.

This completes my post on methods on integration. There are many more. But these are most important techinques. It is extremely important that you try to do the excerises and get good at integration.
~~~
Excercises.

Find the integrals.

1) $\int \sqrt{x+1}dx$ Easy

2) $\int x \ln x dx$ Medium

3) $\int e^{2x} dx$ Easy

4*) $\int e^{\sqrt{x}}$ It has a star what do you think!?

5) $\int \ln (2x) dx$ Easy

6) $\int \ln e^x dx$ Easy

7) $\int e dx$ Easy

8) $\int 0 dx$ Easy

9) $\int \frac{1}{x^2-16} dx$ Medium

10) $\int \frac{1}{1-e^{2x}}dx$ Hard

11) $\int \frac{1}{(x-2)^2(x+1)(x+3)} dx$ Hard

12) $\int \frac{x+\sqrt{x}}{x^2} dx$ Easy

13) $\int \frac{1}{(x-1)^2} dx$ Easy

13. The last lecture discussed how to find anti-derivatives and integrals for functions. Of course, the question is what does that accomplish? We know that derivatives are extremely useful in several problems, but where are integrals used? We will concentrate on several applications of the anti-derivative.

First we need to introduce a definition. The integral $\int f(x) dx$ is called indefinite and it represents all the functions that are anti-derivatives. The integral $\int_a^b f(x) dx$ is not a function, it is a real number. It represents something else which we are about to discuss. Since the indefinite and definite integral are very closely related we use almost the same notation.

Definition: The area below (or above ) a curve is bounded by the x-axis, by the curve, and by the the vertical lines from the endpoints.

Definition: When the curve is above the x-axis is has a positive area, when it is below it has negative area.

Example 44: The curve $y=\sqrt{1-x^2}$ on $[-1,1]$ is a semi-circle, the area is positive for it is above and its value is $(1/2)\pi (1)^2=.5\pi$.

Example 45: If we consider the lower semi-circle $y=-\sqrt{1-x^2}$ on $[-1,1]$ it is the same value only negative.

When calculus was first being developed it was not yet on a solid foundation (math abstraction was almost non-existant at the time). The following definition is closer to the original Calculus.

Definition: The definite integral $\int_a^b f(x) dx$ for a function defined on $[a,b]$ is defined as the area below (or above) the curve (note positive or negative as stated in the definitions above).

Please forgive me, I am going to be using the term "integral" mostly associated with definite integral, and maybe sometimes with "indefinite integral" since it is a waste of keystrokes to type the full meaning you should be able to deduce yourself what context is used.

The problem with this definition is that mathematically we never defined what area means. Yes, it is true we can understand what we are trying to say, but mathematicians cannot accept that. Thus, in the 19th Century, one of my favorites, Georg Riemann defined the integral in acceptable terms. This is what Riemann did:
1)Took some function $f(x)$ defined on $[a,b]$.
2)And broke the region into $n$ rectangles.
3)Then he summed them up in area.
4)Took limit as $n\to \infty$.
5)The more and more rectangles you use, the closer it approaches the area of the curve.
6)This is well-defined, meaning does not matter how you break the region into rectangles the limit is the same.
Thus, Riemann defined area below a curve to be the limiting value of the rectangles.
Animation.
In fact, the greatest, Archimedes, was able to obtain formulas for extremely complicated shapes by using this techinique, that is an early form of integral Calculus.
Thus, you can think of the definite integral as the area below a curve or the limit of a sum (limit of the summing rectangles).

Example 46: The integral $\int_{-1}^1 \sqrt{1-x^2} dx=.5\pi$ from the above discussions. Because that is the area of the semi-circle.

The following should seem simple if you think of integration as area.
Scalar-Multiple: A constant can be factored, $\int_a^b kf(x)=k\int_a^bf(x)$.
Sum: The definite integral of sum is sum of integrals, $\int_a^b f(x) \pm g(x) dx=\int_a^b f(x) dx \pm \int_a^b g(x) dx$.
Subdivision: For some point inside the interval $\int_a^b f(x) dx=\int_a^c f(x) dx+\int_c^a f(x) dx$

Again, look at the sum rule. The integral of a sum is the sum of the integrals, again the linearity property (or homomorphism as I called it, see it appears again and again).

I like to mention that thus far integration (definite) was only defined on an interval $[a,b]$ meaning $a. Thus, we extened the definition as follows. If $a=b$ then there is no interval and hence the integration (area of line) is zero, meaning,
$\int_a^a f(x)dx=0$
When $a>b$ the interval is not defined (empty) but we can still define,
$\int_a^b f(x) dx= -\int_b^a f(x)dx$.
These definitions are sometimes useful.

One more note. Mathematicians use the term integratable. A function $f(x)$ is called integratable on a closed interval $[a,b]$ is said to mean that the Riemann sum (the limit of the rectangles, or the area below a curve) exists. Here is an example, consider $y=x^{-1}$ on $[-1,1]$ and define $x=0$ to be $f(0)=0$. Thus, we have a function $y$ defined to be the reciprocal at all non-zero points and 0 at the zero point. This function is certainly defined on the interval but it is not necessarily integrable, because as we approach the zero point the area gets larger and larger, and we do not know if it has a limit or not. Thus, a useful theoretic theorem says that any continous function is always integratable, it should seem "obvious" again if you think in terms of area.

The Area Problem
The following approach to this problem and other related applied problems is my own. I think this is the smoothest they come. Of course, the problem is given a curve on some closed interval how can we find the area? Say we are given a continous function on the interval $[a,b]$, is there a way to find the area below (or above) it? One way is to approximate the area, but is there an exact way? Yes!
I am not sure who first discovered this, but I believe it was Issac Barrow, he was Newton's teacher.
Let $f(x)$ be a continous curve on the closed interval $[a,b]$.
Define the following function, called the "Area function".
1) $A(x)$ is defined for $a\leq x\leq b$.
2) $A(x)$ is the area of $f(x)$ on $[a,x]$.
Example, $y=x^2$ on $[0,1]$. Then $A(1/2)$ is the area of $y=x^2$ on $[0,1/2]$.
You can think of it as how much we move away from the starting point as we sum the area.
3) $A(a)=0$ because it is the area of a single line.
4) $A(b)$ is the full area, what we actually seek as the answer. That is, $A(b)=\int_a^b f(x) dx$.
Once we defined the Area function here are the steps.
5)Choose a point $x$ in side the interval.
6)Move a little to the right by a very small amount $\Delta x$.
7)The area of the small region is $A(x+\Delta x)-A(x)$.
8)But it is approximately the area of the rectangle (see below), when drawn around it.
9)The area of rectangle is, $f(x)\Delta x$.
Now we have the following equation,
$A(x+\Delta x)-A(x)\approx f(x)\Delta x$
Thus,
$\frac{A(x+\Delta x)-A(x)}{\Delta x}\approx f(x)$.
The smaller the change the closer the approximation.
Thus,
$\lim_{\Delta x \to 0}\frac{A(x+\Delta x)-A(x)}{\Delta x}=f(x)$
But that is the derivative!
$A'(x)=f(x)$.
Thus the derivative of the area function is the original function!!!
Now we can solve the problem (the differencial equation).
Let $F(x)$ be any anti-derivative of $f(x)$. Then,
$A(x)=F(x)+C$. Where $C$ is the constant to be determined. Note, I mentioned before that in order to solve a differencial equation you need initial conditions. The intitial condition was step #3. Thus,
$A(a)=F(a)+C$
$0=F(a)+C$
$C=-F(a)$.
Thus,
$A(x)=F(x)-F(a)$
$A(b)=\int_a^b f(x) dx = F(b)-F(a)$.
Note any anti-derivative works, because the contants end up killing eachother.
That is the solution to our problem. It is remarkable how integration (definite) and differenciation are so closely related to each other.

So we actually has 2 important results from this derivation that I did.

First Fundamental Theorem of Calculus: Let $f(x)$ be a continous curve on $[a,b]$. And if $F(x)$ is a continous function on $[a,b]$ (any) such that $F'(x)=f(x)$ on $(a,b)$ then,
$\boxed{\int_a^b f(x) dx=F(b)-F(a)}$

What an elegant theorem! Here is a related theorem.

Second Fundamental Theorem of Calculus: If $f(x)$ is continous on $[a,b]$. And $c\in (a,b)$, define a function $g(x)=\int_c^x f(t)dt$ then, $g'(x)=f(x)$ for $a.

Proof: We gave a geometric justification for the first theorem, the second theorem is the area function, you see why? Because $\int_a^x f(t)dt$ is the area up to point $x$, which was our $A(x)$ and we shown that its derivative is back to the original functions. I just like to mention in the second theorem we have $f(t)dt$ it is the same function just expressed in terms of $t$ and the purpose of $dt$ is saying we are integrating along the $t$ variable. We did not need to write that, and keep everything in $x$ that is just considered messy using the same variable over and over again.

Now, you see why $\int^{\,}$ (indefinite) and $\int_a^b$ (definite) look similar as symbols, because they are in fact similar. Because to find the definite we need to find the indefinite first, those are our fundamental theorems. I also like to mention that enough though they are called the fundamental theorem they are not, I think that the extreme value theorem that we had and the mean value theorem (which we did not mention) are by far the most important theorems. In fact the formal mathematical proof to the first fundamental theorem is based on the mean value theorem. I just think it is supprising that they were called the fundamental theorems. As a historical note I think it is interesting that the first person to use integration was Archimedes, he solved complicated problem by breaking them into infinitely many smaller objects and adding them up (Riemann sum). Another interesting note, is that the first person to compute the area under $y=x^n, n\not = -1$ was Fermat, though we has one of the founders of Calculus, the concept of integration was non-existant, how he did it, I do not know.

Example 47: Find the area under the parabola $y=x^2$ on $[0,1]$. By the Fundamental Theorem of Calculus we need to simply find the anti-derivative (this is where the lecture on methods of anti-differenciation become important). In that case $F(x)=\frac{1}{3}x^3$. Thus, the answer is $F(1)-F(0)=1/3$. In terms of definite integral it looks like this,
$\int_0^1 x^2 dx=1/3$.

Example 48: Find the area above the parabola $y=-x^2$ on $[0,1]$. Note the answer is going to be positive because we defined signed area that way. We can use the Fundamental Theorem or we can use the property that,
$\int_0^1 -x^2 dx= -\int_0^1 x^2 dx=-1/3$. As expected.

Example 49: Here is an example of the Second Fundamental Theorem. Let $g(x)=\int_1^x 2t dt$. Note if we use the First Fundamental Theorem we need to find an anti-derivative $F(t)=t^2$ and evaluate at the boundary points. $F(x)-F(1)=x^2-1$. Thus, $g(x)=x^2-1$ that means $g'(x)=2x$. Or we could have just empolyed the Second Fundamental Theorem, the term inside the integrand is $2t$ thus, $g'(x)=2x$ (only the variable is renamed).

Example 50: Here is a another example $g(x) = \int_0^{x^2} \sqrt{t^3+1}dt$. That radical is one crazy thing to integrate, it leads to Elliptic Integrals, but there is an easier way to find the derivative without actually integrating. Note we cannot directly use the Second Fundamental Theorem because it does not have the special form thus we need to bring it to that form. And we use our favorite rule, the chain rule.
$g(u)=\int_0^u \sqrt{t^3+1}dt$ and $u(x)=x^2$.
Thus,
$\frac{dg}{du}=\sqrt{u^3+1}=\sqrt{(x^2)^3+1}=\sqrt{ x^6+1}$ (this is Second Fundamental Theorem).
$\frac{du}{dx}=2x$.
Thus, chain rule,
$\frac{dy}{dx}=2x\sqrt{x^6+1}$.

Just a simple point, the top number in the called the upper limit and the bottom number is called the lower limit in a definite integral. When we use the Second Fundamental Theorem of Calculus it does not matter what the lower limit is because we if actually used the First Fundamental Theorem we will get a constant in the end which will die when we differenciate.

Example 51: Find $\int_1^2 \ln x dx$. This is a more advanced integration. Thus we first find the anti-derivative,
$\int \ln x dx=\int (1)\ln x dx$.
Let $u'=1$ and $v=\ln x$.
Thus, $u=x$ and $v'=1/x$.
Integration by parts,
$x\ln x-\int x(1/x) dx=x\ln x-\int 1 dx=x\ln x- x+C$.
We can drop the constant because the theorem works for any anti-derivative.
Thus, $F(x)=x\ln x-x$ is an anti-derivative for $f(x)$. (You can check this to assure thyself by differenciating).
Then we evaluate at the lower an upper limits $F(2)=2\ln 2 - 2$ and $F(1)=1\ln 1-1=-1$
Thus, $F(2)-F(1)=2\ln 2 - 2 +1=2\ln 2-1$, is the exact value.

Just another simple point. Just because the definite integral is positive does not mean it is a positive function. Because it can be more positive than negative, thus the positive area overtakes the negative area.

Two extremely useful rules to know when you integrate, they save a lot ot time. Especially when you start doing Fourier Series.

Definition: An even function (on some interval) is one such that $f(x)=f(-x)$ for all points (on that interval).

Geometrically this means a function which is its mirror image in the y-axis.
Thus, this should seem "obvious".

Theorem: If $f(x)$ is an even function then $\int_{-a}^a f(x) dx=2\int_0^a f(x) dx$.

Proof: Since of symmetry we have the same area on both sides. We choose one side and double.

Definition: An odd function (on some interval) is one such that $f(-x)=-f(x)$ for all points (on that interval).

Geometrically this means a function which is its point image in the origin.
Thus, this shoud seem "obvious"

Theorem: If $f(x)$ is an odd function then $\int_{-a}^a f(x) dx= 0$.

Proof: Since of symettry one half is in the positive and the other in the negative, they kill each other to zero.

Properties
Let $E(x)$ be even and $O(x)$ be odd.
1) $E_1(x)E_2(x)$ is even.
2) $O_1(x) \, O_2(x)$ is even.
3) $E(x)O(x)$ is odd.
4) $O(x)E(x)$ is odd.
Your excercise is to prove them.

A mnemonic is to associate +1 with even and -1 with odd. Then, their product will reveal the sign of the function.

Example 52: Though we never discussed the sine function but if we graph it (in radians) on $[-\pi,\pi]$. We see that $y=\sin x$ is odd (origin point symettry). We also know that $y=x^2$ is even on $[-\pi,\pi]$. Thus, $(+1)(-1)=-1$ meaning $x^2\sin x$ is odd. Thus, $\int_{-\pi}^{\pi}x^2\sin x dx$ is odd. Look how easy! If we did it the actual way we need to use integration by parts twice (because we reduce $x^2$ 1 degree each time we differenciate) and a lot of evaluating.

The rest of this lecture will be based on two more applications of integration. But before I get there I want to discuss some applications of the theorem we just discovered. When I was younger the method I used determine the value of $\pi$ was based on the area below a circle. That is, $y=\sqrt{4-x^2}$ is a semicircle with radius of 2. Thus, the area of quater circle is $\pi$ that is, $\int_0^2 \sqrt{4-x^2} dx=\pi$, thus, I was all excited that I discovered an equation that solves for $\pi$, but the funny thing is that if you actually evaluate that integral (by using advanced techniques that we did not discuss) you will get $\pi$, thus you get $\pi = \pi$ which gives you nothing. However, you can approximate that integral by approximation methods (which we also did not discuss but I can show the general idea of how it is done). And hence we get an equation that approximates $\pi$, (though it can be shown the method is not very efficient). Thus, we have shown how integration can be used in approximation of some irrational numbers, and how mathematicians do it.

Area of Ellipse
The ellipse is one of the four conic sections: circle,ellipse,parabola,hyperbola. Only the main result about the equation of an ellipse will be given without a derivation. The general equation on an ellipse, centered at the origin is,
$\frac{x^2}{a^2}+\frac{y^2}{b^2}=1$.
Where, $a$ is the semi-horizontal distance the ellipse spans.
And, $b$ is the semi-vertical distance the ellipse spans.
A simple mnemonic, since $a$ goes with $x$ it measures the horizontal distance (x-axis).
And since $b$ goes with $y$ it measures the vertical distance (y-axis).
Note, if $a=b$ then,
$\frac{x^2}{a^2}+\frac{y^2}{a^2}=1$
$x^2+y^2=a^2$ is a circle at origin of radius $a$. Thus, in some ways you can think of a circle as an ellipse.

An important integral that we should know, is that for $r>0$, the integral $\int_0^r \sqrt{r^2-x^2} dx = \frac{\pi r^2}{4}$, because that is the quater-area of the circle.

To find the area of an ellipse we need to express this curve as a function $y=f(x)$, and then use Fundamental Theorem of Calculus. Thus,
$b^2x^2+a^2y^2=a^2b^2$
$a^2y^2=b^2(a^2-x^2)$
$y^2=\frac{b^2}{a^2}(a^2-x^2)$
$y=\frac{b}{a}\sqrt{a^2-x^2}$
Note, this is the upper ellipse, because we only solved for the positive sign of $y$.
This curve, the represents a semi-ellipse, at the origin, with sem-horizontal span of $a$ and vertical span of $b$.
Thus, the full area of the ellipse (by symettry) is,
$2\int_{-a}^a \frac{b}{a}\sqrt{a^2-x^2} dx=\frac{2b}{a}\int_{-a}^a\sqrt{r^2-x^2} dx$
But the function $\sqrt{a^2-x^2}$ is even, thus, we can simplify,
$\frac{4b}{a}\int_0^a \sqrt{a^2-x^2} dx$
Recognize, the integral? We mentioned it just before.
Thus,
$\frac{4b}{a} \cdot \frac{\pi a^2}{4}=\pi ab$.
Thus, to find the area of an ellipse you take its semi-major and semi-minor lengths and multiply then together with $\pi$.
Note, if it is a circle then $a=b=r$ and hence $\pi \cdot r\cdot r=\pi r^2$.

The Volume Problem
The general problem of finding a volume below a general surface is a Calculus III problem and it uses the analouge of a double integral, which we will not discuss. However, we will discuss a simpler case which is sometimes useful. It is called "volume of revolution".

Definition: The solid of revolution (about x-axis) is a solid formed by rotating (or spinning) some continous curve $f(x)$ on $[a,b]$.

The approach to finding a formula is similar to what we did in the area problem. You should understand these derivations because you will be able to solve similar related problem yourself.
Again we have a continous curve on $[a,b]$.
We spin it around the x-axis and create a solid.
Let $V(x)$ be a function defined for $a \leq x\leq b$.
The value of $V(x)$ is the volume until $x$ (like the area function).
Thus, $V(a)=0$ it is the the volume of an infinitely thin cyclinder.
And $V(b)$ is the full volume what we seek.
Choose a point $x$ inside the interval.
And move a little to the right by $\Delta x$.
Then the volume of the small section that you moved is,
$V(x+\Delta x)-V(x)$
But we can approximate it, it is almost like the volume of the cyclinder.
The volume of the cylinder is $\pi [f(x)]^2\Delta x$ (the radius squared times its width).
Hence,
$V(x+\Delta x)-V(x)\approx \pi [f(x)]^2 \Delta x$
$\frac{V(x+\Delta x)-V(x)}{\Delta x}\approx \pi [f(x)]^2$
The smaller the increase the closer the approximation,
$\lim_{\Delta x\to 0}\frac{V(x+\Delta x)-V(x)}{\Delta x}=\lim_{\Delta x\to 0}\pi [f(x)]^2$
$V'(x)=\pi [f(x)]^2$
Let $F(x)$ be any anti-derivative of $\pi [f(x)]^2$.
Thus,
$V(x)=F(x)+C$ for some constant.
The initial conditions (in the differencial equations) say,
$V(a)=F(a)+C$
$0=F(a)+C$
$C=-F(a)$.
Thus,
$V(x)=F(x)-F(a)$
$V(b)=F(b)-F(a)=\int_a^b \pi [f(x)]^2 dx=\pi \int_a^b [f(x)]^2 dx$.

We have the following.
Theorem: The volume of revolution of a continous curve on $[a,b]$ (about x-axis) is,
$\pi \int_a^b [f(x)]^2dx$.

Example 53: We can use this to find the volume of a circle. The curve $y=\sqrt{r^2-x^2}$ is a semi-circle. Rotate about the x-axis and we have a sphere of radius $r$. Thus, the volume is,
$\pi \int_{-r}^r (\sqrt{r^2-x^2})^2 dx=2\pi \int_0^r r^2-x^2 dx=\frac{4}{3}\pi r^3$. (Details Omitted).

Here is another useful formula that we will not derive but will later use. The surface area when we rotate a curve is given by,
$2\pi \int_a^b f(x)\sqrt{1+[f'(x)]^2} dx$

Arc Length Problem
Same idea, to find a length of a curve (this time continously differenciable) on the interval.
Again, we have $f(x)$ on $[a,b]$.
Define a function (note the same approach over and over again) $S(x)$.
To be the length from $a$ to $x$.
That is,
$S(a)=0$ (initial condition for differencial equation) and $S(b)$ is what we seek.
Pick a point inside the interval $x$ a move a little to the right by $\Delta x$.
Then, $S(x+\Delta x)-S(x)$ is the length of that small segment.
We can approximate its length by finding the distance between the points.
The first point has location $(x,f(x))$ and the other has $(x+\Delta x,f(x+\Delta x))$
The distance is,
$\sqrt{(x+\Delta x-x)^2+(f(x+\Delta x)-f(x))^2}=\Delta x\sqrt{1+\frac{(f(x+\Delta x)-f(x))^2}{\Delta x^2} }$
Thus,
$\frac{S(x+\Delta x)-S(x)}{\Delta x}\approx \sqrt{1+\frac{(f(x+\Delta x)-f(x))^2}{\Delta x^2} }$
Take the limit,
$\lim_{\Delta x\to 0}\frac{S(x+\Delta x)-S(x)}{\Delta x}=\lim_{\Delta x\to 0}\sqrt{1+\frac{(f(x+\Delta x)-f(x))^2}{\Delta x^2}}$
Thus,
$S'(x)=\sqrt{1+[f'(x)]^2}$
Using the initial condition we get,
$S(b)=\int_a^b \sqrt{1+[f'(x)]^2}dx$.

I want to say what I meant by "continously differenciable". Note, when we take the limit we get the derivative of the of function also, thus we need to know the function is differenciable. But further, we ignored the radical and as if substituted the limit for the function, and we can do that when a functions is continous. Thus, I means the function is differenciable and its derivative is continous (called smooth). But if you do not understand that, that is not important, when you first learn Calculus.

Theorem: The length of a continously differenciable curve $y=f(x)$ on $[a,b]$ is,
$\int_a^b \sqrt{1+[f'(x)]^2} dx$.

This turns out to be a very bad integral to integrate, using the techinques that I shown you because it has a radical. Thus, for many problem texts ask to approximate.

Example 54: Consider the parabola $y=x^2$ on $[0,1]$ the length of this curve is,
$\int_0^1 \sqrt{1+(y')^2} dx=\int_0^1 \sqrt{1+(2x)^2} dx=\int_0^1 \sqrt{1+4x^2} dx$.
We did not discuss how to integrate that, it can be done. The types of functions used are hyperbolic functions, they are related to the exponental $e^x$ function. But that is too advacned for us, and all we did was set up the integral.

The Circumfurence of Ellipse
When I was younger I tried to find what the formula for the perimeter of an ellipse is. My final solution was $\pi(a+b)$ where $a,b$ are the semi-major and minor axis. Hence you can think of this as the average of the radii. It turns out it is wrong but it is close approximation. I believe a much more accurate formula is $\pi \sqrt{a^2+b^2}$. But, the question you probably have is why is there no exact formula? We will answer some of this question.
As we already know any ellipse can be expressed as,
$y=\frac{b}{a}\sqrt{a^2-x^2}$
Again this is just the upper half.
Using the arc length formula we need to find,
$4\int_0^a \sqrt{1+(y')^2} dx$.
Substituting that in for the function and squaring we have,
$4\int_0^a \sqrt{1+\frac{b^2x^2}{a^4-a^2x^2} }dx$.
Now the problem is that this integral cannot be expressed in elementary functions (the standard functions) like I mentioned in the previous lecture that that sometimes happens. This problem was studied primarily by Abel and lead to something call Elliptic Functions. These are new types of functions that integrate this. I myself do not know what courses actually teach these functions, not because they are difficult but because it is unnecessary.

This lecture has demonstrated the power of integrations. Virtually all fields in applied math involve integrations. These were just geometrical applications.

I have exceeded the limit with the number of charachters (limits is 30,000)! I will continue my post with the next post. Along with the Excerises

14. Improper Integrals
All the cases we have considerd thus far where on finite intervals. Several times in math it is useful to consider infinite intervals. The interval we will consider is $[a,\infty)$.

Definition: The integral $\int_a^{\infty} f(x) dx$ is defined as $\lim_{t\to\infty}\int_a^t f(x)dx$ for a function $f(x)$ defined for $x\geq a$.

Example 55: Consider the integrals $\int_0^{\infty} e^{-x}dx$ and $\int_0^{\infty} e^x dx$. To evaluate the first integral we need to find first $\int e^{-x} dx=-e^{-x}+C$ (an anti-derivative) thus $-e^{-x}$ is an anti-derivative (again the constant does not matter when we use the first Fundamental Theorem). And we evaluate it at $t$ and $0$. Thus, $e^{-t}-(-e^0)=1+e^{-t}$ now take the limit $t\to \infty$ and we get $1+e^{-t}\to 0$. Thus, $\int_0^{\infty} e^{-x} dx=1$ and we say this integral converges (it has a limit). But when we do $\int_0^{\infty} e^x dx$ we end up with $e^t-1$ where $t\to \infty$ thus this value gets larger and larger hence no limit. Thus we say it diverges. Geometrically means that the infinite area below the first curves actually exists!

Example 56: Consider $\int_1^{\infty} \frac{1}{x} dx$ and $\int_1^{\infty} \frac{1}{x^2} dx$. First we work with the first integral. First we find the anti-derivative $\ln x$ and evaluate it at $t$ and $1$. Thus, $\ln t-\ln 1=\ln t-0=\ln t$ and as $t\to \infty$ this grows without bound (but very slowly). Thus, the integral diverges, geometrically it means the infinite area gets infinitely large. Now the second integral. First we find the anti-derivative $-\frac{1}{x}$. And evaluate it $1-\frac{1}{t}$ and as $t\to \infty$ we have $1-0=1$. Geometrically it means the infinite area exists below the curve.

I love this next example. But we will not be able to do it in full detail. A Gabriel Horn is a solid obtained from rotating the curve $y=\frac{1}{x}$ about the x-axis. And hence we get an infinitely long horn which gets narrower and narrower. The two questions that are of interest is the volume and the surface area of revolution. The volume in this case is on the interval $[1,\infty)$ because we are considering an infinitely long horn.
Thus,
$\pi \int_1^{\infty} \frac{1}{x^2} dx=\pi$ as in previous example.
But the surface area is,
$2\pi \int_1^{\infty} \sqrt{\frac{1+x^2}{x^4}} dx\to \infty$ (Details Omitted).
This surface area integral is a little difficult to show that it diverges. But you can trust me on it.
The paradox is that we can fill Gabriel's Horn with paint but we cannot paint it!

Gamma Function*
I mentioned before that usually the standard (elementary) functions are used to described physical phenomenon. There are two functions that appear a lot in applied math that are not elementary at all but are needed. Those, are the Bessel and Gamma functions. The former is beyond this lecture to explain the later is difficult but is within our understanding. The Gamma function was discovered by Leonard Euler in 1729. It is a generalization of the factorial. Let me explain. Mathematicians love to generalize things. Meaning a definition works positive integers. Then mathematicians generalize it for all integers. Then mathematicians generalize it for all rationals. Then for all real numbers. The word generalize means that the formula still works for the previous cases but now works for even more cases.
Well one way to generalize the factorial is like this. $n!=n(n-1)...(2)(1)$ for all positive integers and zero otherwise. Yes, it is a generalization, but that generalization is not interesting. The type of generalization that we want, is a continous generalization. Meaning, if we graph the "stupid" generalization it will be discontinous, the graph will rip. We want one that smoothly passes through all the factorial integer points. And this is done with the Gamma function.

Definition: The Gamma function is defined for $s>0$ as $\Gamma (s)= \int_0^{\infty}e^{-t}t^{s-1} dt$.

The reason why it is for $s>0$ is because there is a chance for it to diverge, which we do not want. Thus, to be safe we work with the positive real numbers. I do not know of any simple way to show this, thus you will have to accept that.

The most useful property of the Gamma function.
Theorem: The Gamma function satisfies $\Gamma (s+1)=s\Gamma (s)$.

Proof: We have $\Gamma (s+1)=\int_0^{\infty} e^{-t} t^{s} dt$. We will use integration by parts. Let $u=t^s$ and $v'=e^{-t}$, thus, $u'=st^{s-1}$ and $v=- e^{-t}$
Thus,
$uv-\int u'v dt$
$-e^{-t}t^s-\int -se^{-t}t^{s-1}dt=-e^{-t}t^s + s\int e^{-t}t^{s-1}dt$
Evaluate at the endpoints, $N\to \infty,0$,
$-e^{-N}N^s + e^00^s + s\int_0^ N e^{-t}t^{s-1}dt=\lim_{N\to \infty} \frac{N^s}{e^N}+s\int_0^N e^{-t}t^{s-1}dt$
Exponentials always overtake polynomials thus,
$\lim_{N\to \infty} \frac{N^s}{e^N}=0$
And we are left with,
$\lim_{N\to \infty} s\int_0^N e^{-t}t^{s-1} dt=s\Gamma (s)$.

Now we get to the generalization.

Theorem: If $n$ is a positive integer then $\Gamma (n+1)=n!$.

Proof: If $n=1$ then $\Gamma (2)=\Gamma (1+1)=1\Gamma(1)=\int_0^{\infty}e^{-t}t^0 dt=1$
(We had this improper integral before).
Next,
$\Gamma (3)=\Gamma (2+1)=2\Gamma (1)=2$
$\Gamma (4)=\Gamma (3+1)=3\Gamma (2)=3\cdot 2$
$\Gamma (5)=\Gamma (4+1)=4\Gamma (4)=4\cdot 3\cdot 2$
Thus,
$\Gamma (n+1)=n!$

We can make the following useful definition.
Definition: For any real number $r>-1$ we define $r!=\Gamma (r+1)$.
Specifically,
$0!=\Gamma (0+1)=1$.

Thus, a famous factorial is,
$\frac{1}{2}!=\Gamma (3/2)=\frac{\sqrt{\pi}}{2}$.
This derivation is beyond this lecture.
This, factorial is related to one of the most important integrals in probability theory,
$\int_{-\infty}^{\infty}e^{-x^2} dx=\sqrt{\pi}$.
(The double infinity means the same thing, in this case both limits go to infinity when you evaluate).
This equation is truly elegant because though we have no elementary function for the anti-derivative (as I discussed) we can still evaluate it with some tricks.
I hope you understand the Gamma function, it is one of my favorite functions. Note your graphing program has the command "gamma(x)" to graph the Gamma function.

There is another type of improper integral that I will only mention. Note, we cannot do,
$\int_{-1}^1 \frac{1}{x} dx$ by the Fundamental Theorem of Calculus. Because $y=\frac{1}{x}$ is not continous on this interval. In fact it is not even defined! Thus, there is another Improper integral that deals with a case where a function is not defined at a point. But because this is not as nearly as interesting at the infinity integral we will avoid that.

As promised the are the excerises from the continued lecture as well.
~~~
Excerises

1)Find $\int_0^1 \sqrt{1-x^2} dx$

2)Find $\int_{0}^3 2x dx$ geometrically without Calculus.

3)Prove the properties of the product of even and odd functions.

4)Find $\int_0^e x\ln x dx$

5)Find the area between the line $y=x$ and the parabola $y=x^2$.

6*) Find the volume of revolution for a function $f(x)$ on $[a,b]$ about the y-axis, using the same techinique as I used above.

7)Find $\int_1^{\infty} \frac{1}{1+x} dx$ or show it diverges.

8)Find $\int_1^{\infty} \frac{1}{(x+1)(x+2)} dx$

9)Using the surface area of revolution find the surface area for a sphere with a radius $r$.

10)Find $\Gamma (6)$ and $\Gamma (7/2)$.

11*)Argue that if $f(x)>0$ is continous on $x\geq 1$ and the improper integral $\int_1^{\infty} f(x)dx$ converges then $f(x)$ cannot be an increasing function.

15. this thread is simply fantastic, many thanks!

Page 1 of 4 1234 Last