# gradient of a function and matrices

• Sep 26th 2012, 11:58 PM
noway
gradient of a function and matrices
Hi,

I need help computing the gradient of a function.

$\displaystyle f(b) = (y-Xb)^{2}+b^{2}$

Where y and b are vectors, and X is a matrix.

So

$\displaystyle f(b) = (y-Xb)^{T}(y-Xb)+b^{T}b)$

$\displaystyle f(b) = (y^{T}y -y^{T}Xb - b^{T}X^{T}y + b^{T}X^{T}Xb +b^{T}b)$

I need help to find $\displaystyle \triangledown _{b} (y^{T}y -y^{T}Xb - b^{T}X^{T}y + b^{T}X^{T}Xb+b^{T}b)$

Thanks!
• Sep 27th 2012, 12:03 AM
chiro
Re: gradient of a function and matrices
Hey noway.

Usually you have identities for these kinds of things, but if you are trying to prove it then that's different?

Are you trying to simply calculate it or to prove what the result is?
• Sep 27th 2012, 12:06 AM
noway
Re: gradient of a function and matrices
I've calculated it, but I'm doing it incorrectly I guess. It is part of another problem, so I need to get it right before moving on.
• Sep 27th 2012, 12:13 AM
chiro
Re: gradient of a function and matrices
Is this just a standard directional derivative with respect to some vector?

If so it's going to have some dot product and hence some matrix expansion in terms of b and the normal derivative of that matrix expression.

I can't recall the identities off the top of my head, but if you do a search they should be out there.
• Sep 27th 2012, 01:34 AM
johnsomeone
Re: gradient of a function and matrices
There are severals ways to find $\displaystyle \triangledown f(b_0)$, because there are several ways to interpret it. I'll show you a kinda weird way, but one that's often very useful:

Think of a $\displaystyle C^1$ curve $\displaystyle g: (-1, 1) \rightarrow \mathbb{R}^n$ such that $\displaystyle g(0) = b_0$ and $\displaystyle g'(0) = v$.

Then $\displaystyle \left. \frac{d}{dt}\right|_{t=0} f(g(t)) = < \triangledown f (b_0), g'(0) > = < \triangledown f (b_0), v >$.

Now, although there are infinitely many, $\displaystyle g(t) = b_0 + vt$ is the natural choice for g. However, we won't even need that.

$\displaystyle < \triangledown f (b_0), v > = \left. \frac{d}{dt}\right|_{t=0} f(g(t)) = \left.\frac{d}{dt}\right|_{t=0} \left( \lVert y - X g(t) \rVert^2 + \lVert g(t) \rVert^2 \right)$

$\displaystyle = \left.\frac{d}{dt}\right|_{t=0} \left( <y - X g(t), y - X g(t)> + <g(t), g(t)> \right)$

$\displaystyle = \left.\frac{d}{dt}\right|_{t=0} \left( <y, y> - 2<X g(t), y> + <X g(t), X g(t)> + <g(t), g(t)> \right)$

$\displaystyle = \left.\frac{d}{dt}\right|_{t=0} \left( <y, y> - 2<g(t), X^ty> + <X g(t), X g(t)> + <g(t), g(t)> \right)$

$\displaystyle = \left.\frac{d}{dt}\right|_{t=0}( <y, y>) - 2\left.\frac{d}{dt}\right|_{t=0}(< g(t), X^ty>) +.....$

$\displaystyle ........ + \left.\frac{d}{dt}\right|_{t=0}<X g(t), X g(t)> + \left.\frac{d}{dt}\right|_{t=0}(<g(t), g(t)>)$

$\displaystyle = 0 - 2< g'(0), X^ty> + (<Xg'(0), Xg(0)> + <Xg(0), Xg'(0)>)+ ....$

$\displaystyle ........ + (<g'(0), g(0)> + <g(0), g'(0)>)$

$\displaystyle = - 2 < g'(0), X^ty> + 2 <Xg'(0), Xg(0)> + 2<g'(0), g(0)>$

$\displaystyle = - 2 < v, X^ty> + 2 <Xv, Xb_0> + 2<v, b_0>$

$\displaystyle = - 2 < v, X^ty> + 2 <v, X^tXb_0> + 2<v, b_0>$

$\displaystyle = < v, \{- 2 X^ty + 2 X^tXb_0 + 2 b_0 \}>$

Therefore $\displaystyle < \triangledown f (b_0), v > = < v, \{- 2 X^ty + 2 X^tXb_0 + 2 b_0 \}>$.

So $\displaystyle < $\ \triangledown f (b_0) - \{- 2 X^ty + 2 X^tXb_0 + 2 b_0 \} \$, v > = 0$, and since that holds for EVERY $\displaystyle v$, have:

$\displaystyle \triangledown f (b_0) = - 2 X^ty + 2 X^tXb_0 + 2 b_0 = 2 \{ b_0 - X^t (y - X b_0) \}$.