The idea is that if you want to approximate a value that is close to another value, you use the tangent to do so.
Picture a function (any function) and you know the value at one point. Now you want to approximate the value of the function at a value that's close to that point but you only know the tangent.
So what you do is basically treat the thing like it was a right-angled triangle that is parallel to the x-axis. You have one point on the left which is your known value (call it f(a)) and you have another point to the right which lies on this tangent line (the third point is a straight line above or below that point that is parallel to the y-axis).
Now what you are doing is basically calculating the 2nd point above to be your function at the new point.
Algebraically you start with f(a) (the first point) and you have the tangent line (f'(a)) and to calculate the approximation you start at f(a) and go in the direction of f'(a) so many units until you get f(x+a) as your approximation.
Now you want to find the difference between f(a) and f(x+a).
The x is the length of the base of the triangle, the tangent line for this approximation is the hypotenuse and the adjacent side is the difference between f(x+a) and f(a).
Now we know that the tangent really is the tangent of angle where by high school math tan = opposite/adjacent.
We want to find the adjacent side but we know that tan = opposite/adjacent = f'(a). We know the adjacent is x and we need to find the opposite so doing some algebra gives us:
f'(a) = [f(a+x)-f(a)]/x which gives us
x*f'(a) = f(a+x) - f(a) which gives
f(a+x) = f(a) + x*f'(a) and that is our linearity condition.
It might be helpful to draw a triangle to show this: your triangle will have the points (in counterclockwise order) (a,f(a)), (x+a,f(a)) and (x+a,f(x+a)).