Hi!

In several engineering publications I've come across a variational version of the classical Lagrange optimization method, which goes like this. Suppose you'd want to maximize a functional $\displaystyle \int f(x,g(x)) d \mu(x)$ with respect to the function $\displaystyle g$ where $\displaystyle \mu(x)$ is some given measure, under an inequality constraint of the form $\displaystyle \int g(x) d\mu(x) \geq 0$. You'd define a Lagrangian

$\displaystyle L = \int f(x,g(x)) d \mu(x) + \lambda \int g(x) d\mu(x)$

What the authors then do is to set up the stationarity condition while apparently "removing" the integral:

$\displaystyle \frac{\partial f}{\partial g} + \lambda = 0$

This equation is then solved in $\displaystyle g$, so you obtain a solution function $\displaystyle g^\star(x,\lambda)$ which is a function of $\displaystyle x$ and $\displaystyle \lambda$. Finally, the Lagrange multiplier $\displaystyle \lambda \geq 0$ is chosen such as to fulfill the inequality constraint $\displaystyle \int g^\star(x,\lambda) d\mu(x)$, which in most problems turns out to be unique, e.g. thanks to benign monotonicity properties.

This approach feels very sloppy to me, but seems to yield correct results. I've never understood why. Can you explain it to me?

The only explanation that I could think of, is that the integral is removed "tentatively" to see if one finds a solution to the Karush-Kuhn-Tucker conditions. That's legitimate. But then, to ensure we have the global solution, one would need a proof of the problem's convexity. Is it as profane as this, or is there a deeper explanation? Under which (sufficient) conditions is the above approach correct?

Jens