I'm trying to understand a paper which shows a linear model:

where is an vector, is an matrix, and is an vector. Additive Gaussian noise is represented by with variance .

I'm trying to understand how it is that the data log likelihood has this form:

There is no indication of how the log likelihood in eqn (2) follows from (1), and I've seen it in a few papers already, so I'm assuming it's standard prerequisite knowledge of probability. I'm having trouble finding sources where I could look this kind of thing up.

Another confusing aspect, is that has dimensions , so how can you calculate its square, and how can the log likelihood be a scalar?