For example, why can't you add a 2x2 matrix with at 2x3 matrix?
For the same reason you can't add "3" and a rectangle. Look at the definition of "addition" of matrices: $\displaystyle (A+ B)_{ij}= A_{ij}+ B_{ij}$. For every "ij" component that A has B must also have that component. A 3 by 3 matrix, as in dwsmith's example, has "31", "32", and "33" components that a 2 by 2 matrix simply doesn't have.
It makes sense to me that you imagine zeros. Isn't a matrix a way of organizing a system of linear equations? Since $\displaystyle 1x+1y=1x+1y+0z$ couldn't you put zeros in the third column?$\displaystyle \displaystyle\begin{bmatrix}1&1&0\\1&1&0\end{bmatr ix}+\begin{bmatrix}1&1&1\\1&1&1\end{bmatrix}$
You can't do that. The z component isn't 0. It doesn't exist. For instance, you can't add a $\displaystyle \mathbb{R}^2$ vector to a $\displaystyle \mathbb{R}^3$ vector because the k component doesn't exist in $\displaystyle \mathbb{R}^2$. If the k component is 0, then the vector is in $\displaystyle \mathbb{R}^3$
You can't add 2 matrices of different sizes using the usual definition of addition because the definition itself says you can't. It is certainly possible to define other operations and call them "additions" in which you can add matrices of different sizes. There are infinitely many ways to define such operations.
Perhaps your question is "why was this particular definition of addition chosen when it seems so restrictive?" I'm not sure that I know the answer to this question, but I can make some guesses: This is probably the most natural definition, it's a very simple definition, and it leads to a nice group structure.
The reason for this addition is simple: Matrices are linear maps, and linear maps are matrices! Without the addition being the way it is this duality wouldn't exist.
That said, the addition that Jskid describes will give you a ring structure (if you assume the number of rows is fixed); in 1-dimension, it is the ring R[X] (the ring of polynomials over R), while adding more rows is akin to a cross-product, $\displaystyle R[X]\times R[X]\times\ldots$, if you define multiplication correctly. However, the multiplication is not composition of functions, unlike normal matrix stuff.