.or equivalently, for any measurable subset of ,
.(for this one to make full sense, we would need to choose an order, for instance )
All I want to highlight is that is just a probability distribution on . Therefore if you have a sequence of distributions defined similarly with respect to other matrices (of any size), this is just a sequence of distributions on , and the method of moments would apply as usual (or "as the wikipedia says").
In the case of Wigner theorem (in the way you state it, which is a weak form), the proof that for all , where is the semi-circle law, goes through combinatorial arguments (I'll give a reference); this holds under suitable normalization of the matrices or suitable choice of the variances of the Gaussian entries (in that sense, your statement is not complete).
And you can justify that the moments of the semi-circular law characterize it (i.e. that the condition given in the wikipedia holds) by showing that they grow at most exponentially: for some (the odd moments are 0, and for as well, so the odd moments don't matter), so that the moment generating function (or Laplace transform) exists; this function depends only on the moments, and it characterizes the distribution (classic...) hence the moments characterize the distribution. You'll find the computation of the moments in the reference. There is probably an elementary proof of the theorem of moments in this specific case (using the above bound and Laplace transforms), rather than the theorem from the wikipedia which is "optimal" in a sense but probably not easy (actually I don't know a proof so I may be wrong).
You'll find every detail I skipped in the first pages of this (massive) introduction to random matrices by A.Guionnet and O.Zeitouni. They actually prove a stronger result. The sequence of mean eigenvalue distributions is denoted , not to be confused with which is the (random) uniform distribution on the set of eigenvalues.
When we are dealing with , we are actually dealing with a sequence of random probability distributions (namely, the uniform distribution on the set of the eigenvalues of the matrices in the sequence). Therefore it makes little sense to say that converges in distribution to something. It could however converge in distribution almost-surely (i.e. for almost all matrices, the corresponding sequence of "uniform distributions on eigenvalues" converges in distribution), or in probability (it takes more care to define). In the case of Wigner theorem in a stronger form, we have a statement like, loosely speaking: "In probability, the sequence of "uniform distributions on eigenvalues" converges in distribution"... This is what the reference explicitates and proves, using the convergence of the moments of the mean eigenvalue distribution (yours) and an additional property quantifying how much the eigenvalue distribution fluctuates from the mean eigenvalue distribution.
I hopes this clarifies a few things. Tell me if not.