The correct way to think of differentiation is that it is a linearisation process. In elementary calculus, you have a function f(x), whose graph is some sort of curve, and when you differentiate it you approximate the curve by a straight line, namely the tangent to the curve.

For a function , when you differentiate it you are approximating f by a linear map from to . But linear maps between vector spaces are described by matrices. For a linear map from to , you write the elements of and as column vectors, and the linear map is given by a matrix. The formula for the derivative matrix is that it satisfies the equation .

For this example, f takes the vector to the vector . You want to find Df(0), which will be a matrix such that . As it happens, this function f is already linear, so the term disappears, and . So .

As you already said, this process of finding automatically includes a proof that f is differentiable at .