Results 1 to 2 of 2

Thread: Vector differentiation

  1. #1
    Newbie
    Joined
    Sep 2008
    Posts
    3

    Vector differentiation

    I'm working through some equations in a statistics textbook, but am confused over some steps...
    Let $\displaystyle RSS$ be Residual Sum of Squares (doesn't really matter)
    $\displaystyle \beta \in \mathbb{R}^{p+1}$
    $\displaystyle y \in \mathbb{R}^{N}$
    $\displaystyle X $ be an $\displaystyle N \times (p+1) $ matrix

    We have (for linear regression)
    $\displaystyle RSS(\beta) = (y - X\beta)^T(y-X\beta)$

    Question:
    How would you get to the following?
    $\displaystyle
    \frac{\partial RSS}{\partial \beta} = -2X^T(y-X\beta),
    $
    $\displaystyle
    \frac{\partial^2 RSS}{\partial \beta \partial \beta^T}=-2X^TX.
    $

    I'm especially confused over $\displaystyle \frac{\partial^2}{\partial \beta \partial \beta^T}$... also, are there 'rules of differentiation' with vectors that could be easily remembered?
    Follow Math Help Forum on Facebook and Google+

  2. #2
    MHF Contributor

    Joined
    May 2008
    Posts
    2,295
    Thanks
    7
    Quote Originally Posted by ltcd View Post
    I'm working through some equations in a statistics textbook, but am confused over some steps...
    Let $\displaystyle RSS$ be Residual Sum of Squares (doesn't really matter)
    $\displaystyle \beta \in \mathbb{R}^{p+1}$
    $\displaystyle y \in \mathbb{R}^{N}$
    $\displaystyle X $ be an $\displaystyle N \times (p+1) $ matrix

    We have (for linear regression)
    $\displaystyle RSS(\beta) = (y - X\beta)^T(y-X\beta)$

    Question:
    How would you get to the following?
    $\displaystyle
    \frac{\partial RSS}{\partial \beta} = -2X^T(y-X\beta),
    $
    let $\displaystyle \beta=\begin{bmatrix}\beta_1 \\ . \\ . \\ . \\ \beta_{p+1} \end{bmatrix}$ and $\displaystyle z=\begin{bmatrix}z_1 & . & . & . & z_{p+1} \end{bmatrix},$ where $\displaystyle z_j$ are constants w.r.t. all $\displaystyle \beta_k.$ then $\displaystyle z\beta=z_1\beta_1 + \cdots + z_n\beta_n.$ thus: $\displaystyle \frac{\partial{(z\beta)}}{\partial{\beta}} =\begin{bmatrix}\frac{\partial{(z\beta)}}{\partial {\beta}_1} \\ . \\ . \\ . \\ \frac{\partial{(z\beta)}}{\partial{\beta}_{p+1}} \end{bmatrix}=\begin{bmatrix}z_1 \\ . \\ . \\ . \\ z_{p+1} \end{bmatrix}=z^T. \ \ \ \ (1)$

    similarly if $\displaystyle w$ is a $\displaystyle (p+1) \times 1$ constant vector, then: $\displaystyle \frac{\partial{(\beta^T w)}}{\partial{\beta}} =w. \ \ \ \ \ (2)$

    also, with a little bit more effort, you can show that for any $\displaystyle (p+1) \times (p+1)$ constant matrix $\displaystyle A$ we have: $\displaystyle \frac{\partial{(\beta^T A \beta)}}{\partial{\beta}}=(A+A^T)\beta. \ \ \ \ \ (3)$

    now solving the problem is very easy:

    $\displaystyle RSS(\beta) = (y - X\beta)^T(y-X\beta)=(y^T-\beta^T X^T)(y-X\beta)$

    $\displaystyle =y^Ty-y^TX\beta - \beta^TX^Ty+\beta^TX^TX\beta \ \ \ \ \ (4)$

    now in (4): the derivative of $\displaystyle y^Ty$ is clearly 0. by (1) the derivative of $\displaystyle y^TX\beta$ is $\displaystyle (y^TX)^T=X^Ty.$ by (2) the derivative of $\displaystyle \beta^TX^Ty$ is $\displaystyle X^Ty.$ finally by (3) the derivative

    of $\displaystyle \beta^TX^TX\beta$ is $\displaystyle 2X^TX\beta.$ thus: $\displaystyle \frac{\partial{RSS(\beta)}}{\partial{\beta}}=-X^Ty-X^Ty+2X^TX=-2X^Ty+2X^TX\beta. \ \ \ \ \ (5)$

    $\displaystyle
    \frac{\partial^2 RSS}{\partial \beta \partial \beta^T}=-2X^TX.
    $ are you sure it's not $\displaystyle \color{blue}{2X^TX}$ instead ?
    recall that if $\displaystyle f(\beta)=\begin{bmatrix}f_1(\beta) \\ . \\ . \\ . \\ f_{p+1}(\beta) \end{bmatrix},$ where $\displaystyle f_j(\beta)$ are scalar functions of $\displaystyle \beta,$ then $\displaystyle \frac{\partial{f}}{\partial{\beta}^T}=A=[a_{ij}],$ where $\displaystyle A$ is a $\displaystyle (p+1) \times (p+1)$ defined by $\displaystyle a_{ij}=\frac{\partial{f_i}}{\partial{\beta_j}}.$ using this you

    can very easily show that if $\displaystyle B$ is a $\displaystyle (p+1) \times (p+1)$ constant matrix, then $\displaystyle \frac{\partial{(B \beta)}}{\partial{\beta}^T}=B. \ \ \ \ \ \ (6)$

    thus by (5) and (6): $\displaystyle \frac{\partial{RSS(\beta)}}{\partial{\beta \beta^T}}=\frac{\partial{(-2X^Ty+2X^TX\beta)}}{\partial{\beta}^T}=2X^TX.$
    Follow Math Help Forum on Facebook and Google+

Similar Math Help Forum Discussions

  1. Vector Differentiation
    Posted in the Calculus Forum
    Replies: 3
    Last Post: Jan 10th 2011, 06:52 AM
  2. Replies: 3
    Last Post: Dec 22nd 2008, 09:49 PM
  3. Vector Differentiation etc.
    Posted in the Advanced Applied Math Forum
    Replies: 4
    Last Post: Feb 27th 2008, 11:49 AM
  4. vector differentiation by looping in Mathematica
    Posted in the Math Software Forum
    Replies: 0
    Last Post: Jan 22nd 2008, 06:42 PM
  5. Matrix/Vector differentiation
    Posted in the Calculus Forum
    Replies: 0
    Last Post: Sep 17th 2007, 11:20 PM

Search Tags


/mathhelpforum @mathhelpforum