For the benefit of others, I found a great source from Matrix normal distribution (Wikipedia) as the reference:
Dawid, A.P. (1981). "Some matrix-variate distribution theory: Notational considerations and a Bayesian application". Biometrika 68 (1): 265–274.
which suggests that the squared notation in equation (2) isn't a matrix square operation, and one can see how (2) follows from (1) with something like
since is a random variable of mean with variance as depicted in (1). Seems like this is on the right track, but correct me if I am wrong.