For the benefit of others, I found a great source from Matrix normal distribution (Wikipedia) as the reference:

Dawid, A.P. (1981). "Some matrix-variate distribution theory: Notational considerations and a Bayesian application".Biometrika68 (1): 265–274.

which suggests that the squared notation in equation (2) isn't a matrix square operation, and one can see how (2) follows from (1) with something like

since is a random variable of mean with variance as depicted in (1). Seems like this is on the right track, but correct me if I am wrong.