I am not exactly sure what you are asking but as
Aryth said it seems to use the
implicit function theorem. In its most basic form it says that if
(this notation it means that
is a function of two variables) is a
function (ignore what this mean - it just means the function is behaved well-enough such as being differenciable). Say that
. The implicit function theorem says that if
then the equation
can be solved
uniquely (for
) in the "neighborhood" of
. The term "in the neighborhood" means in some small disk around
we can solve this equation uniquely for
(in terms of
).
For example, consider the circle
. Where can we solve for
uniquely (in a neighborhood)? Define the function
. The point
lies on this circle because
. If we try to solve the equation within the small disk
we see that it has two solutions, one in the upper half and one in the lower half (see red circle) - in fact those solutions are
. And this happens no matter how small we make the circle. Therefore, the equation
cannot be solved
uniquely. Let us see what happens when we use the implicit function theorem. The theorem says that if
then we can solve the equation uniquely. Since we cannot solve this equation uniquely it must mean that
. Check this:
and so
. However, if
and
i.e. we stay away from the break in the graph where it lies above and below the x-axis then
and therefore we can solve this equation
uniquely. As in the case of the blue circle, the solution is
.
Let us say that
is function which satisfies the condition of the implicit function theorem as in the first paragraph. Then in the neighborhood of
(where
is a solution to
) we can solve for
uniquely in terms of
. This means we can define a new function
in this neighborhood. Remember a function is a set of pairs so that for each first coordinate there is a
unique matching coordinate. Therefore, if we let
be the first coordinate then by the theorem there is a
unique matching coordinate so that
. And this would define a function
.
But the implicit function theorem does not stop here, it says more it says that
is itself
(which is differenciable with some more properties). Now since
. The functions
and
are differenciable and so by the
chain rule for multivariable functions we get that (differenciating both sides of
)
and this means
(the partials are evaluated at
). And that gives you a formula.
If you got that then great. There is just one more point to be addressed. If it confuses you just ignore that. The questionable step is dividing by
. How do we know it is non-zero? This is where we use two facts. The first one is that
. The second one is that
is
. The first fact is not good enough to say that
for all points around
because it might be non-zero at
and yet be zero at some points around
. This is where we need the second fact. The meaning of
is that the function is differenciable
and the derivative is continous. Therefore
is a continous function. Since
it means there is a small enough neighborhood so that
for points close to
by the definition of continuity.