My question is:
Can a convex/concave function have a saddle point?
My answer would be:
Convex and concave function do not have saddle points, because a saddle point is not a local extremum.
Is this answer correct? How could I explain it better?
My question is:
Can a convex/concave function have a saddle point?
My answer would be:
Convex and concave function do not have saddle points, because a saddle point is not a local extremum.
Is this answer correct? How could I explain it better?
Are you asserting that every point on the graph of a convex or concave function must be a local extremum?
If not, the fact that a saddle point is not a local extremum does not immediately imply that it cannot be a saddle point.
Neither convex or concave functions can have saddle points.
Let's show this for convex functions, then a similar argument can be used for concave. First, we need some definitions.
1. $\displaystyle N_\epsilon(x)$ denotes the $\displaystyle \epsilon$-neighborhood of x.
2. Given a Frechet differentiable $\displaystyle f: X\rightarrow \Re $ where X is a Hilbert space (which includes $\displaystyle \Re^n$), a stationary point is a point x where $\displaystyle \nabla f(x)=0$. I'm sure we could do this with Gatteaux (directional) derivatives, but it's simpler to assume we have a gradient.
3. A saddle point is a stationary point that is neither a local min or a local max.
4. Given $\displaystyle f: X\rightarrow \Re $, a local min is a point x such that there exists an $\displaystyle \epsilon$-neighborhood such that $\displaystyle f(x)\leq f(y)$ for all $\displaystyle y\in N_\epsilon(x)$.
5. A convex function is a function $\displaystyle f: X\rightarrow \Re $ where for all$\displaystyle x,y\in X$ and $\displaystyle \lambda\in [0,1]$, $\displaystyle f(\lambda x + (1-\lambda) y) \leq \lambda f(x) + (1-\lambda) f(y)$
Now, let us assume that f is Frechet differentiable and convex. Let us also assume that there exists a stationary point x that is a saddle point. Since x is a saddle point, x is not a local min. That means that for every $\displaystyle \epsilon$-neighborhood of x there exists $\displaystyle y\in N_\epsilon(x)$ such that$\displaystyle f(y) < f(x)$. Next, from convexity we have that
$\displaystyle f(\lambda y + (1-\lambda) x) \leq \lambda f(y) + (1-\lambda) f(x)$
for $\displaystyle \lambda \in (0,1)$. Then, since $\displaystyle \lambda >0$, we have that
$\displaystyle \frac{f(\lambda y + (1-\lambda) x)}{\lambda} \leq \frac{\lambda f(y) + (1-\lambda) f(x)}{\lambda}$
Rearranging terms, we have
$\displaystyle \frac{f(x-\lambda (y-x))-f(x)}{\lambda} + f(x) \leq f(y)$
Taking the limit as $\displaystyle \lambda\rightarrow 0$, we have that
$\displaystyle \langle\nabla f(x),y-x\rangle + f(x) \leq f(y)$
where $\displaystyle \langle \cdot,\cdot\rangle$ denotes the inner product on X. In $\displaystyle \Re^n$, we typically have that $\displaystyle \langle x,y\rangle=x^Ty$. In any case, since x is stationary, $\displaystyle \nabla f(x)=0$. Hene, we reduce the above inequality into
$\displaystyle f(x) \leq f(y)$
This contradicts the assumption that x is not a local min. Hence, if f is convex and x is stationary, then x is a local, and in fact global, min.