1. ## Vector differentiation

I'm working through some equations in a statistics textbook, but am confused over some steps...
Let $RSS$ be Residual Sum of Squares (doesn't really matter)
$\beta \in \mathbb{R}^{p+1}$
$y \in \mathbb{R}^{N}$
$X$ be an $N \times (p+1)$ matrix

We have (for linear regression)
$RSS(\beta) = (y - X\beta)^T(y-X\beta)$

Question:
How would you get to the following?
$
$

$
$

I'm especially confused over $\frac{\partial^2}{\partial \beta \partial \beta^T}$... also, are there 'rules of differentiation' with vectors that could be easily remembered?

2. Originally Posted by ltcd
I'm working through some equations in a statistics textbook, but am confused over some steps...
Let $RSS$ be Residual Sum of Squares (doesn't really matter)
$\beta \in \mathbb{R}^{p+1}$
$y \in \mathbb{R}^{N}$
$X$ be an $N \times (p+1)$ matrix

We have (for linear regression)
$RSS(\beta) = (y - X\beta)^T(y-X\beta)$

Question:
How would you get to the following?
$
$
let $\beta=\begin{bmatrix}\beta_1 \\ . \\ . \\ . \\ \beta_{p+1} \end{bmatrix}$ and $z=\begin{bmatrix}z_1 & . & . & . & z_{p+1} \end{bmatrix},$ where $z_j$ are constants w.r.t. all $\beta_k.$ then $z\beta=z_1\beta_1 + \cdots + z_n\beta_n.$ thus: $\frac{\partial{(z\beta)}}{\partial{\beta}} =\begin{bmatrix}\frac{\partial{(z\beta)}}{\partial {\beta}_1} \\ . \\ . \\ . \\ \frac{\partial{(z\beta)}}{\partial{\beta}_{p+1}} \end{bmatrix}=\begin{bmatrix}z_1 \\ . \\ . \\ . \\ z_{p+1} \end{bmatrix}=z^T. \ \ \ \ (1)$

similarly if $w$ is a $(p+1) \times 1$ constant vector, then: $\frac{\partial{(\beta^T w)}}{\partial{\beta}} =w. \ \ \ \ \ (2)$

also, with a little bit more effort, you can show that for any $(p+1) \times (p+1)$ constant matrix $A$ we have: $\frac{\partial{(\beta^T A \beta)}}{\partial{\beta}}=(A+A^T)\beta. \ \ \ \ \ (3)$

now solving the problem is very easy:

$RSS(\beta) = (y - X\beta)^T(y-X\beta)=(y^T-\beta^T X^T)(y-X\beta)$

$=y^Ty-y^TX\beta - \beta^TX^Ty+\beta^TX^TX\beta \ \ \ \ \ (4)$

now in (4): the derivative of $y^Ty$ is clearly 0. by (1) the derivative of $y^TX\beta$ is $(y^TX)^T=X^Ty.$ by (2) the derivative of $\beta^TX^Ty$ is $X^Ty.$ finally by (3) the derivative

of $\beta^TX^TX\beta$ is $2X^TX\beta.$ thus: $\frac{\partial{RSS(\beta)}}{\partial{\beta}}=-X^Ty-X^Ty+2X^TX=-2X^Ty+2X^TX\beta. \ \ \ \ \ (5)$

$
are you sure it's not $\color{blue}{2X^TX}$ instead ?
recall that if $f(\beta)=\begin{bmatrix}f_1(\beta) \\ . \\ . \\ . \\ f_{p+1}(\beta) \end{bmatrix},$ where $f_j(\beta)$ are scalar functions of $\beta,$ then $\frac{\partial{f}}{\partial{\beta}^T}=A=[a_{ij}],$ where $A$ is a $(p+1) \times (p+1)$ defined by $a_{ij}=\frac{\partial{f_i}}{\partial{\beta_j}}.$ using this you
can very easily show that if $B$ is a $(p+1) \times (p+1)$ constant matrix, then $\frac{\partial{(B \beta)}}{\partial{\beta}^T}=B. \ \ \ \ \ \ (6)$
thus by (5) and (6): $\frac{\partial{RSS(\beta)}}{\partial{\beta \beta^T}}=\frac{\partial{(-2X^Ty+2X^TX\beta)}}{\partial{\beta}^T}=2X^TX.$