I'm trying to understand a paper which shows a linear model:

$\displaystyle \mathbf{x} = \mathbf{A}\mathbf{s} + \mathbf{\epsilon}\qquad\qquad(1)$

where $\displaystyle \mathbf{x}$ is an $\displaystyle L \times N$ vector, $\displaystyle \mathbf{A}$ is an $\displaystyle L \times M$ matrix, and $\displaystyle \mathbf{s}$ is an $\displaystyle M \times N$ vector. Additive Gaussian noise is represented by $\displaystyle \mathbf{\epsilon}$ with variance $\displaystyle \sigma^2$.

I'm trying to understand how it is that the data log likelihood has this form:

$\displaystyle \log P(\mathbf{x}|\mathbf{A},\mathbf{s}) \propto -\frac{1}{2 \sigma^2}(\mathbf{x}-\mathbf{A}\mathbf{s})^2\qquad\qquad(2)$

There is no indication of how the log likelihood in eqn (2) follows from (1), and I've seen it in a few papers already, so I'm assuming it's standard prerequisite knowledge of probability. I'm having trouble finding sources where I could look this kind of thing up.

Another confusing aspect, is that $\displaystyle (\mathbf{x}-\mathbf{A}\mathbf{s})$ has dimensions $\displaystyle L \times N$, so how can you calculate its square, and how can the log likelihood be a scalar?