gradient of a function and matrices

Hi,

I need help computing the gradient of a function.

$\displaystyle f(b) = (y-Xb)^{2}+b^{2}$

Where y and b are vectors, and X is a matrix.

So

$\displaystyle f(b) = (y-Xb)^{T}(y-Xb)+b^{T}b)$

$\displaystyle f(b) = (y^{T}y -y^{T}Xb - b^{T}X^{T}y + b^{T}X^{T}Xb +b^{T}b)$

I need help to find $\displaystyle \triangledown _{b} (y^{T}y -y^{T}Xb - b^{T}X^{T}y + b^{T}X^{T}Xb+b^{T}b)$

Thanks!

Re: gradient of a function and matrices

Hey noway.

Usually you have identities for these kinds of things, but if you are trying to prove it then that's different?

Are you trying to simply calculate it or to prove what the result is?

Re: gradient of a function and matrices

I've calculated it, but I'm doing it incorrectly I guess. It is part of another problem, so I need to get it right before moving on.

Re: gradient of a function and matrices

Is this just a standard directional derivative with respect to some vector?

If so it's going to have some dot product and hence some matrix expansion in terms of b and the normal derivative of that matrix expression.

I can't recall the identities off the top of my head, but if you do a search they should be out there.

Re: gradient of a function and matrices

There are severals ways to find $\displaystyle \triangledown f(b_0)$, because there are several ways to interpret it. I'll show you a kinda weird way, but one that's often very useful:

Think of a $\displaystyle C^1$ curve $\displaystyle g: (-1, 1) \rightarrow \mathbb{R}^n$ such that $\displaystyle g(0) = b_0$ and $\displaystyle g'(0) = v$.

Then $\displaystyle \left. \frac{d}{dt}\right|_{t=0} f(g(t)) = < \triangledown f (b_0), g'(0) > = < \triangledown f (b_0), v >$.

Now, although there are infinitely many, $\displaystyle g(t) = b_0 + vt$ is the natural choice for g. However, we won't even need that.

$\displaystyle < \triangledown f (b_0), v > = \left. \frac{d}{dt}\right|_{t=0} f(g(t)) = \left.\frac{d}{dt}\right|_{t=0} \left( \lVert y - X g(t) \rVert^2 + \lVert g(t) \rVert^2 \right)$

$\displaystyle = \left.\frac{d}{dt}\right|_{t=0} \left( <y - X g(t), y - X g(t)> + <g(t), g(t)> \right)$

$\displaystyle = \left.\frac{d}{dt}\right|_{t=0} \left( <y, y> - 2<X g(t), y> + <X g(t), X g(t)> + <g(t), g(t)> \right)$

$\displaystyle = \left.\frac{d}{dt}\right|_{t=0} \left( <y, y> - 2<g(t), X^ty> + <X g(t), X g(t)> + <g(t), g(t)> \right)$

$\displaystyle = \left.\frac{d}{dt}\right|_{t=0}( <y, y>) - 2\left.\frac{d}{dt}\right|_{t=0}(< g(t), X^ty>) +.....$

$\displaystyle ........ + \left.\frac{d}{dt}\right|_{t=0}<X g(t), X g(t)> + \left.\frac{d}{dt}\right|_{t=0}(<g(t), g(t)>)$

$\displaystyle = 0 - 2< g'(0), X^ty> + (<Xg'(0), Xg(0)> + <Xg(0), Xg'(0)>)+ ....$

$\displaystyle ........ + (<g'(0), g(0)> + <g(0), g'(0)>)$

$\displaystyle = - 2 < g'(0), X^ty> + 2 <Xg'(0), Xg(0)> + 2<g'(0), g(0)>$

$\displaystyle = - 2 < v, X^ty> + 2 <Xv, Xb_0> + 2<v, b_0>$

$\displaystyle = - 2 < v, X^ty> + 2 <v, X^tXb_0> + 2<v, b_0>$

$\displaystyle = < v, \{- 2 X^ty + 2 X^tXb_0 + 2 b_0 \}>$

Therefore $\displaystyle < \triangledown f (b_0), v > = < v, \{- 2 X^ty + 2 X^tXb_0 + 2 b_0 \}>$.

So $\displaystyle < \[ \ \triangledown f (b_0) - \{- 2 X^ty + 2 X^tXb_0 + 2 b_0 \} \ \], v > = 0$, and since that holds for EVERY $\displaystyle v$, have:

$\displaystyle \triangledown f (b_0) = - 2 X^ty + 2 X^tXb_0 + 2 b_0 = 2 \{ b_0 - X^t (y - X b_0) \} $.