**ltcd** I'm working through some equations in a statistics textbook, but am confused over some steps...

Let $\displaystyle RSS$ be Residual Sum of Squares (doesn't really matter)

$\displaystyle \beta \in \mathbb{R}^{p+1}$

$\displaystyle y \in \mathbb{R}^{N}$

$\displaystyle X $ be an $\displaystyle N \times (p+1) $ matrix

We have (for linear regression)

$\displaystyle RSS(\beta) = (y - X\beta)^T(y-X\beta)$

**Question:**

How would you get to the following?

$\displaystyle

\frac{\partial RSS}{\partial \beta} = -2X^T(y-X\beta),

$