why do we say that the central difference is a better approximation for the derivative than the forward or backward difference?
In all the above cases the error term is of magnitude h^2
Technically, sometimes the forward or backward difference is a better approximation. It depends a lot on the particular situation in which you find yourself. However, most of the time, the central difference approximates the derivative better because it incorporates information on both sides of the point at which you're finding the derivative.