My teacher gave the definition that:

yet it seems to make more sense to first define

where A takes on the role of the sample space and then divide both the top and bottom of the fraction by n(S) and getting:

which by definition of P would yield the equation at the top.

Does this make sense? Also, can anyone recommend a good probability book that does these kind of derivations?