I understand the likelihood function as the joint distribution of the data (y|X), but with the parameters as variables. What I don't understand is why MLE problems in econometrics always use the distribution of the model error rather than the conditional distribution of the ys.
Eg: If the model is
yt = xt'b + et,
et ~ iid N(0, sigma^2)
yt and et are scalars; xt and b are vectors.
Why is the mean of the distribution (for MLE purposes) taken to be yt - xt'b and not just xt'b?
I'm sure it's something fairly obvious, which is why I can't find the answer in my notes, in articles online or in my textbook. Thanks!