For example, why can't you add a 2x2 matrix with at 2x3 matrix?
For the same reason you can't add "3" and a rectangle. Look at the definition of "addition" of matrices: . For every "ij" component that A has B must also have that component. A 3 by 3 matrix, as in dwsmith's example, has "31", "32", and "33" components that a 2 by 2 matrix simply doesn't have.
You can't add 2 matrices of different sizes using the usual definition of addition because the definition itself says you can't. It is certainly possible to define other operations and call them "additions" in which you can add matrices of different sizes. There are infinitely many ways to define such operations.
Perhaps your question is "why was this particular definition of addition chosen when it seems so restrictive?" I'm not sure that I know the answer to this question, but I can make some guesses: This is probably the most natural definition, it's a very simple definition, and it leads to a nice group structure.
The reason for this addition is simple: Matrices are linear maps, and linear maps are matrices! Without the addition being the way it is this duality wouldn't exist.
That said, the addition that Jskid describes will give you a ring structure (if you assume the number of rows is fixed); in 1-dimension, it is the ring R[X] (the ring of polynomials over R), while adding more rows is akin to a cross-product, , if you define multiplication correctly. However, the multiplication is not composition of functions, unlike normal matrix stuff.