1. ## Vector differentiation

I'm working through some equations in a statistics textbook, but am confused over some steps...
Let $\displaystyle RSS$ be Residual Sum of Squares (doesn't really matter)
$\displaystyle \beta \in \mathbb{R}^{p+1}$
$\displaystyle y \in \mathbb{R}^{N}$
$\displaystyle X$ be an $\displaystyle N \times (p+1)$ matrix

We have (for linear regression)
$\displaystyle RSS(\beta) = (y - X\beta)^T(y-X\beta)$

Question:
How would you get to the following?
$\displaystyle \frac{\partial RSS}{\partial \beta} = -2X^T(y-X\beta),$
$\displaystyle \frac{\partial^2 RSS}{\partial \beta \partial \beta^T}=-2X^TX.$

I'm especially confused over $\displaystyle \frac{\partial^2}{\partial \beta \partial \beta^T}$... also, are there 'rules of differentiation' with vectors that could be easily remembered?

2. Originally Posted by ltcd
I'm working through some equations in a statistics textbook, but am confused over some steps...
Let $\displaystyle RSS$ be Residual Sum of Squares (doesn't really matter)
$\displaystyle \beta \in \mathbb{R}^{p+1}$
$\displaystyle y \in \mathbb{R}^{N}$
$\displaystyle X$ be an $\displaystyle N \times (p+1)$ matrix

We have (for linear regression)
$\displaystyle RSS(\beta) = (y - X\beta)^T(y-X\beta)$

Question:
How would you get to the following?
$\displaystyle \frac{\partial RSS}{\partial \beta} = -2X^T(y-X\beta),$
let $\displaystyle \beta=\begin{bmatrix}\beta_1 \\ . \\ . \\ . \\ \beta_{p+1} \end{bmatrix}$ and $\displaystyle z=\begin{bmatrix}z_1 & . & . & . & z_{p+1} \end{bmatrix},$ where $\displaystyle z_j$ are constants w.r.t. all $\displaystyle \beta_k.$ then $\displaystyle z\beta=z_1\beta_1 + \cdots + z_n\beta_n.$ thus: $\displaystyle \frac{\partial{(z\beta)}}{\partial{\beta}} =\begin{bmatrix}\frac{\partial{(z\beta)}}{\partial {\beta}_1} \\ . \\ . \\ . \\ \frac{\partial{(z\beta)}}{\partial{\beta}_{p+1}} \end{bmatrix}=\begin{bmatrix}z_1 \\ . \\ . \\ . \\ z_{p+1} \end{bmatrix}=z^T. \ \ \ \ (1)$

similarly if $\displaystyle w$ is a $\displaystyle (p+1) \times 1$ constant vector, then: $\displaystyle \frac{\partial{(\beta^T w)}}{\partial{\beta}} =w. \ \ \ \ \ (2)$

also, with a little bit more effort, you can show that for any $\displaystyle (p+1) \times (p+1)$ constant matrix $\displaystyle A$ we have: $\displaystyle \frac{\partial{(\beta^T A \beta)}}{\partial{\beta}}=(A+A^T)\beta. \ \ \ \ \ (3)$

now solving the problem is very easy:

$\displaystyle RSS(\beta) = (y - X\beta)^T(y-X\beta)=(y^T-\beta^T X^T)(y-X\beta)$

$\displaystyle =y^Ty-y^TX\beta - \beta^TX^Ty+\beta^TX^TX\beta \ \ \ \ \ (4)$

now in (4): the derivative of $\displaystyle y^Ty$ is clearly 0. by (1) the derivative of $\displaystyle y^TX\beta$ is $\displaystyle (y^TX)^T=X^Ty.$ by (2) the derivative of $\displaystyle \beta^TX^Ty$ is $\displaystyle X^Ty.$ finally by (3) the derivative

of $\displaystyle \beta^TX^TX\beta$ is $\displaystyle 2X^TX\beta.$ thus: $\displaystyle \frac{\partial{RSS(\beta)}}{\partial{\beta}}=-X^Ty-X^Ty+2X^TX=-2X^Ty+2X^TX\beta. \ \ \ \ \ (5)$

$\displaystyle \frac{\partial^2 RSS}{\partial \beta \partial \beta^T}=-2X^TX.$ are you sure it's not $\displaystyle \color{blue}{2X^TX}$ instead ?
recall that if $\displaystyle f(\beta)=\begin{bmatrix}f_1(\beta) \\ . \\ . \\ . \\ f_{p+1}(\beta) \end{bmatrix},$ where $\displaystyle f_j(\beta)$ are scalar functions of $\displaystyle \beta,$ then $\displaystyle \frac{\partial{f}}{\partial{\beta}^T}=A=[a_{ij}],$ where $\displaystyle A$ is a $\displaystyle (p+1) \times (p+1)$ defined by $\displaystyle a_{ij}=\frac{\partial{f_i}}{\partial{\beta_j}}.$ using this you

can very easily show that if $\displaystyle B$ is a $\displaystyle (p+1) \times (p+1)$ constant matrix, then $\displaystyle \frac{\partial{(B \beta)}}{\partial{\beta}^T}=B. \ \ \ \ \ \ (6)$

thus by (5) and (6): $\displaystyle \frac{\partial{RSS(\beta)}}{\partial{\beta \beta^T}}=\frac{\partial{(-2X^Ty+2X^TX\beta)}}{\partial{\beta}^T}=2X^TX.$