# Thread: Data likelihood

1. ## Data likelihood

I'm trying to understand a paper which shows a linear model:

$\displaystyle \mathbf{x} = \mathbf{A}\mathbf{s} + \mathbf{\epsilon}\qquad\qquad(1)$

where $\displaystyle \mathbf{x}$ is an $\displaystyle L \times N$ vector, $\displaystyle \mathbf{A}$ is an $\displaystyle L \times M$ matrix, and $\displaystyle \mathbf{s}$ is an $\displaystyle M \times N$ vector. Additive Gaussian noise is represented by $\displaystyle \mathbf{\epsilon}$ with variance $\displaystyle \sigma^2$.

I'm trying to understand how it is that the data log likelihood has this form:

$\displaystyle \log P(\mathbf{x}|\mathbf{A},\mathbf{s}) \propto -\frac{1}{2 \sigma^2}(\mathbf{x}-\mathbf{A}\mathbf{s})^2\qquad\qquad(2)$

There is no indication of how the log likelihood in eqn (2) follows from (1), and I've seen it in a few papers already, so I'm assuming it's standard prerequisite knowledge of probability. I'm having trouble finding sources where I could look this kind of thing up.

Another confusing aspect, is that $\displaystyle (\mathbf{x}-\mathbf{A}\mathbf{s})$ has dimensions $\displaystyle L \times N$, so how can you calculate its square, and how can the log likelihood be a scalar?

2. For the benefit of others, I found a great source from Matrix normal distribution (Wikipedia) as the reference:

Dawid, A.P. (1981). "Some matrix-variate distribution theory: Notational considerations and a Bayesian application". Biometrika 68 (1): 265–274.

which suggests that the squared notation in equation (2) isn't a matrix square operation, and one can see how (2) follows from (1) with something like

$\displaystyle \mathbf{x}|\mathbf{A},\mathbf{s} \sim \mathcal{N}(\mathbf{A}\mathbf{s},\sigma)$

since $\displaystyle \mathbf{x}$ is a random variable of mean $\displaystyle \mathbf{A}\mathbf{s}$ with variance $\displaystyle \sigma^2$ as depicted in (1). Seems like this is on the right track, but correct me if I am wrong.