Probability from a Joint Distribution
We are given the following joint distribution:
| | | X | | |
| | 1 | 2 | 3 | 4 |
| 1 | 1/4 | 0 | 1/8 | 1/8 |
| Y | 2 | 1/8 | 0 | 1/16 | 1/16 |
| 3 | 1/16 | 0 | 1/32 | 1/32 |
| 4 | 1/16 | 0 | 1/32 | 1/32 |
(The X and Y are labels and should span the entire row/column, but I can't figure out how to merge cells with this).
Out of the questions asked regarding this table, I was stuck on three.
1) What is P(X)?
I interpreted this as "what is the probability that X occurs", but isn't that the sum of all of these probabilities? And therefore, 1? It seems incorrect.
2) What it P(Y|X)?
I'm getting confused about what is being asked when they simply ask things about X and Y without mentioning a specific condition for one of them. For example, if they asked for P(X|Y=3), I know that as the marginal probability and would sum up P(1,3)+P(2,3)+P(3,3)+P(4,3). But what does it mean when it's just something like P(Y|X)? I know that in English it's ready as "probability of Y given X", but without some sort of condition it doesn't make sense to me.
3) What is P(X=3|Y)?
I'm confused about this because the conditional (=3) is in the first argument. As noted above, if it's the other way around I know how to solve it. But I'm not sure what it is asking here. Is it simply asking for P(3,1)+P(3,2)+P(3,3)+P(3,4) like a regular marginal? I'm confused about what it means to be "given Y" in such a circumstance.
Thanks.
Re: Probability from a Joint Distribution
Hey tangibleLime.
For P(X) you are looking at the distribution only for X. Basically for P(X = x) you need to consider that this probability relates to P(X = x|Y = 1) + P(X = x|Y = 2) + P(X = x| Y = 3) + P(X = x| Y = 4) if you are only looking at P(X = x) since you are now looking at a less general distribution that only considers the value of X and not of Y and so you need to collect all the probabilities that have x as your common factor: you can think of this as "slicing".
As for P(Y|X) you should think about this in terms of P(Y=y|X=x) which means that you know your value of x and what to get the probability for the "slice" of X given that value.
Mathematically, P(Y=y|X=x) = P(X=x,Y=y)/P(X=x) so what you are actually doing is looking at only the slice for X=x and then looking for the ratio of the probability for a particular y in the context of all probabilities where X = x.
For P(X=3|Y) you are looking at a specific distribution where X=3 in the Y slice and this means you need to consider all possibilities of Y if you haven't been given one. This means that this probability distribution asks you to consider P(X=3|Y=1), P(X=3|Y=2), P(X=3|Y=3), and P(X=3|Y=4) and all of these values will form your distribution for this random variable.
Re: Probability from a Joint Distribution
Thanks, but I'm still confused. I don't know what to do when you have, for example, P(X = x), because what is x? Is it just an argument in a sum function? So would P(X) = {P(X=1,Y=1)+P(X=1,Y=2)..., P(X=2,Y=1)+P(X=2,Y=2)... , ... }? So P(X) would form a probability distribution instead of just a single number?
Given that, I still don't understand P(Y,X), or what P(Y=y|X=x) means. I don't understand what these "=x" and "=y" are.
Re: Probability from a Joint Distribution
Yes that's correct: P(X=x) forms a distribution for all values of x in X.
P(Y,X) = P(Y=y,X=x) or in other words, the probability of getting X = x and Y = y and is also called the joint distribution. It's a PDF for getting a specific x and y.
P(Y=y|X=x) is the probability of getting a specific y given a specific x, but if any of these varies (like P(Y|X=x)) then you have a standard distribution.
Conditional distributions are always distributions but they are subset distributions: in other words you have one giant distribution and then a conditional one looks at a subset of events and makes that a distribution of its own: that's the basic idea. (The same happens for marginal distributions: they are also a specific subset).
Re: Probability from a Joint Distribution
Okay, thank you. I think I've got two of them. But they still look strange...
So P(X|Y=3) and P(X=3|Y) involve the same operation..? So P(X=3|Y) = P(Y|X=3)?
And I still don't understand P(Y|X), and I assume P(X|Y), and I still don't understand "=x" and "=y". I get that it means the probability of getting a specific y given a specific x, but I am not told what the y or x variable is. So I still don't know, conceptually or mathematically, how or what P(Y|X) or P(X|Y) is.
Re: Probability from a Joint Distribution
P(X=3|Y) is not equal to P(Y|X=3) in general: they are completely different probabilities and probability statements.
In terms of P(Y|X), this is simply a definition of a random variable Y given some known random variable X. If we write P(Y|X=x) this means this is a distribution for Y given a specific value of X and for each value of x you could consider this a "slice" where each of these is a univariate distribution and if you include each slice then you now have a bivariate distribution P(Y|X) since not only Y can take on multiple values, but X can take on multiple values as well (potentially).
Remember that in a normal univariate you have P(X=x) that gives a specific probability given x (or is just a density function if X is continuous) but the point is that you get one value where as X is a random variable with non-zero probabilities taking on all values of possible x's.
Conceptually you can think of P(Y|X) as the following: you have a random variable X which its own possibilities. Let's say you get a realization (call it x) and now the distribution of Y is going to change based on that x obtained and mathematically this is written as P(Y|X=x). Now lets say you get a specific realization for Y (call it y), then this specific probability which will be a number involving no variables will be P(Y=y|X=x) and it should be a valid probability.
Remember that you have a distribution for all possible x's and y's, but the key thing is that these are not always independent so in P(Y|X), if X and Y are not independent, then the distribution for Y will change given some X value and vice-versa. This is why I mentioned "slicing" because each slice refers to getting a different value for the conditioned variable: if the random variables were independent all slices would give the same probability distribution for Y but if they aren't, then it means they change depending on the slice.