Hello to everyone, just a small question:

Imagine we have a function of two variables x,y. Then in order to find a local minimum the so-called gradient descent (or steepest descent) method can be used:

1. Find the gradient vector on an initial guess (x0,y0)

2. Move from (x0,y0) to (x1,y1) by following the negative direction of the gradient vector.

The question is: Why we should follow the negative of the gradient? The gradient vector at (x0,y0) DOES NOT show the direction of maximum increase of the function but the direction of maximum CHANGE. So why we should follow the inverse direction in order to locate a minimum, what is the philosophy behind this desicion?

Thank you a lot in advance!