I understand the likelihood function as the joint distribution of the data (y|X), but with the parameters as variables. What I don't understand is why MLE problems in econometrics always use the distribution of the modelerrorrather than the conditional distribution of the ys.

Eg: If the model is

yt = xt'b + et,

et ~ iid N(0, sigma^2)

yt and et are scalars; xt and b are vectors.

Why is the mean of the distribution (for MLE purposes) taken to be yt - xt'b and not just xt'b?

I'm sure it's something fairly obvious, which is why I can't find the answer in my notes, in articles online or in my textbook. Thanks!