With kernels you basically use them as a way of transforming things. Typically with things that are invertible you have one kernel that takes something in one space to another, and an inverse kernel takes it from the new space back to the original space.
Depending on the kernel and what it is used for (and what it means intuitively) you will have different kinds of spaces.
With regards to interpolation you are going from a space with points (i.e. a lattice) to a polynomial in some degree (in this case bi-variate 3x3 degree).
In general you can look at kernels in a broader way which take signal data and then transform this to a different space and then by inverting given the data, you get approximations of the original signal in that particular space.
This is how things like picture, video, and audio compression work: if you know a good basis for representing the data in the best way you take a signal, transform it to your new basis and keep only the information for that basis.
You re-construct the information by using the definition of the basis and then use the inverse kernel to go from new optimal basis to old basis.
This kind of thing applies to both lossless methods (i.e. don't lose any information) to lossy methods (lose information at some point but for whatever purpose its used for, this doesn't matter).