The disturbance matrix and covariance

This may be a silly question, but I would really appreciate a serious answer.

The disturbance matrix can be constructed by multiplying the disturbance vector with its transpose. The diagonal then contains variances, the off-diagonal elements are covariances between pairs of observations (right?).

My question is: Why do we get covariances this way (when the equation is really more demanding)? Is it because the expected value of the disturbance term is 0, so that it already contains deviance scores (where the expected value is subtracted)?

I am inclined to think so, but I don’t like the fact that the expected values that are subtracted are based on all the observations, whereas the “covariances” in the matrix just concern single pairs of them. Am I missing something? Is it ok to base the expected values used in the covariance equation on more observations than are included in the calculation of specific entries?