# Thread: Why use the joint distribution of the errors in Maximum Likelihood estimation?

1. ## Why use the joint distribution of the errors in Maximum Likelihood estimation?

I understand the likelihood function as the joint distribution of the data (y|X), but with the parameters as variables. What I don't understand is why MLE problems in econometrics always use the distribution of the model error rather than the conditional distribution of the ys.

Eg: If the model is

yt = xt'b + et,

et ~ iid N(0, sigma^2)

yt and et are scalars; xt and b are vectors.

Why is the mean of the distribution (for MLE purposes) taken to be yt - xt'b and not just xt'b?

I'm sure it's something fairly obvious, which is why I can't find the answer in my notes, in articles online or in my textbook. Thanks!

2. Figured it out. I'm known for making hilarious errors, but this is my crowning achievement :P

The mean is xt'b, but the normal kernel involves (y - mu)^2, not mu^2.