1. ## Conditional density function?

Hi, I'm not sure what the notation is supposed to mean in the highlighted box and how it relates to conditional probability:
-------------------

----------------
The definition of conditional probability that I'm familiar with is P(A|B) = P(A, B)/P(B). How (if at all) does that relate to what the authors have written there? Some special properties of the normal distribution perhaps?

PS. Theta is a vector of reals, as is x, so $\displaystyle \theta^Tx^{(i)}=\theta\cdot x^{(i)}$

2. Firstly, check that you understand how this relates to the previous line. It is simply making the subsititution $\displaystyle y_i = \epsilon_i + \theta^T x_i$

The notation appears to be a conditional pdf for Y, ie (very loosely*):

$\displaystyle p(y_i | x_i ; \theta) \approx \mathbb{P}(Y_i = y_i | x_i ; \theta)$

That is to say, in very loose terms* it is telling your the conditional probability of Y|X, if the parameter value is equal to theta. The parameter value has to be included because it made the substitution $\displaystyle y_i = \epsilon_i + \theta^T x_i$, which will be different for every possible value of $\displaystyle \theta$

*in fact, pdfs do not give probabilities, but thats a seperate topic...

3. Thanks a lot! But didn't they make the substitution eps = y - theta . x instead? Since the mean of the normal dist is theta . x ?

4. yeh i just typoed the subsitituion formula

Should have been

$\displaystyle y_i=\epsilon + \theta^T x_i$ which rearranges to what you said

5. But then I suppose my question is, if they made that substitution for $\displaystyle \epsilon$, why did they change

$\displaystyle p(\epsilon)$ to $\displaystyle p(y|x)$, and not $\displaystyle p(y-\theta^Tx)$ or something like that (if that even makes sense)? I.e. Why did it become conditional all of a sudden?

6. i think:
Remember that once you condition on x, $\displaystyle \theta^T x$ is just a number. This means you can make the substitution $\displaystyle \epsilon = y_i - \theta^T x_i$ without worrying about how the randomness of x will affect the distribution of y.