I have the following optimisation problem:

$\displaystyle \begin{array}{ll}\min_M & E(M) = \sum_i f_i(\mathbf a_i^T M \mathbf a_i)\\ \text{s.t.} & M \succeq 0\\&\mathrm{rank}(M) \leq k\end{array}$

where M is a (NxN) matrix $\displaystyle \mathbf a_i$'s are constant vectors and $\displaystyle f_i$'s are convex fonctions.

I think that this is a convex optimisation problem.

To avoid having to deal with the constraint I factorize the matrix $\displaystyle M=L^TL$, where $\displaystyle L$ is a $\displaystyle (k\times N)$ matrix. This ensures that M is PSD and of rank k at most. But the problem is not convex any more.

$\displaystyle \min_L F(L) = \sum_i f_i(\mathbf a_i^T L^T L \mathbf a_i)$

However in practicle applications, gradient descent based methods seem to lead to good solutions, and I would have to proove it.

My intuition is that the minima when optimizing over L, correspond to minima of the constrained problem and that other correspond either to maxima either to saddle points, which would mean that an iterative optimisation process based on gradient descent has very low probabilities to get stuck at a 0 gradient point which is not the minimum.

the gradient wrt to L is:

$\displaystyle \frac{\partial F(L)}{\partial L} = 2L \sum_i f'_i(\mathbf a_i^T L^T L \mathbf a_i) \mathbf a_i \mathbf a_i^T $

How can I proove that or at least get some hints weither it's true or not ?