1. ## Data likelihood

I'm trying to understand a paper which shows a linear model:

$\mathbf{x} = \mathbf{A}\mathbf{s} + \mathbf{\epsilon}\qquad\qquad(1)$

where $\mathbf{x}$ is an $L \times N$ vector, $\mathbf{A}$ is an $L \times M$ matrix, and $\mathbf{s}$ is an $M \times N$ vector. Additive Gaussian noise is represented by $\mathbf{\epsilon}$ with variance $\sigma^2$.

I'm trying to understand how it is that the data log likelihood has this form:

$\log P(\mathbf{x}|\mathbf{A},\mathbf{s}) \propto -\frac{1}{2 \sigma^2}(\mathbf{x}-\mathbf{A}\mathbf{s})^2\qquad\qquad(2)$

There is no indication of how the log likelihood in eqn (2) follows from (1), and I've seen it in a few papers already, so I'm assuming it's standard prerequisite knowledge of probability. I'm having trouble finding sources where I could look this kind of thing up.

Another confusing aspect, is that $(\mathbf{x}-\mathbf{A}\mathbf{s})$ has dimensions $L \times N$, so how can you calculate its square, and how can the log likelihood be a scalar?

2. For the benefit of others, I found a great source from Matrix normal distribution (Wikipedia) as the reference:

Dawid, A.P. (1981). "Some matrix-variate distribution theory: Notational considerations and a Bayesian application". Biometrika 68 (1): 265–274.

which suggests that the squared notation in equation (2) isn't a matrix square operation, and one can see how (2) follows from (1) with something like

$\mathbf{x}|\mathbf{A},\mathbf{s} \sim \mathcal{N}(\mathbf{A}\mathbf{s},\sigma)$

since $\mathbf{x}$ is a random variable of mean $\mathbf{A}\mathbf{s}$ with variance $\sigma^2$ as depicted in (1). Seems like this is on the right track, but correct me if I am wrong.