Lets consider these one at a time.
The first one requires two numbers since m encodes information about two points (how they change over time) while b encodes information about an initial condition.
If you didn't have m and b you would have to use four data points (x0,y0,x1,y1) to get m and b.
The reason why we can reduce this is because the slope doesn't change between any two sets of points.
The four point set-up doesn't assume a particular model (in this case a linear one) while the y = mx + b does. When you can assume more information, the amount of variables you need to describe that model decreases.
For number 2 you should be aware that you can calculate f'(a) from f(a) so these quantities are not really independent in a symbolic sense (since we have information to get a derivative easily from the equation of a line).