# I'm confused between probability density function andcumulative distribution function

• Sep 7th 2012, 11:54 AM
supermario88
I'm confused between probability density function andcumulative distribution function

http://www.math.dartmouth.edu/~prob/prob/prob.pdf

My question is on pg 70, example 2.13. I don't understand how they made the move from P(U less than or equal to sqrt{x}) to sqrt{x}.

For the same example, I don't understand why the graph of the probability density function (f(x)) is so different from the distribution function (F(x)). This is Figure 2.13. I thought that the cumulative distribution function is just the total area under the density function. So why wouldn't we only have one graph (just the density function graph) and say that the total area under the curve is the cumulative distribution function.

Could you explain how given some information (other than the density function), how I could derive the distribution function? For example

The experiment is to toss two balls into four boxes in such a way that each ball is equally likely to fall in any box. Let X denote the number of balls in the first box.

What is the cumulative distribution function of X?

• Sep 7th 2012, 03:19 PM
emakarov
Re: I'm confused between probability density function andcumulative distribution func
Quote:

Originally Posted by supermario88
My question is on pg 70, example 2.13. I don't understand how they made the move from P(U less than or equal to sqrt{x}) to sqrt{x}.

If U is uniformly distributed on [0, 1], then the probability that U belongs to some set E is the length (more precisely, measure) of E. In particular, the probability that 0 <= U <= sqrt(x) is sqrt(x) - 0 = sqrt(x).

Quote:

Originally Posted by supermario88
For the same example, I don't understand why the graph of the probability density function (f(x)) is so different from the distribution function (F(x)).

Why do you think that the graph of a function F(x) = sqrt(x) is supposed to be similar to the graph of its derivative f(x) = 1 / (2sqrt(x))? You can verfy that f(x) = F'(x) and you can plot both functions. They way they look is just a fact.

Quote:

Originally Posted by supermario88
I thought that the cumulative distribution function is just the total area under the density function.

That's correct.

Quote:

Originally Posted by supermario88
So why wouldn't we only have one graph (just the density function graph) and say that the total area under the curve is the cumulative distribution function.

I am not sure I understand your concern. This is like saying that we have the graph of y = x, but there is never a need to consider the graph of y = x^n because x^n is just a repeated integral of x times a constant.

Quote:

Originally Posted by supermario88
The experiment is to toss two balls into four boxes in such a way that each ball is equally likely to fall in any box. Let X denote the number of balls in the first box.

What is the cumulative distribution function of X?

Provided the two tosses are independent, P(X = 0) = 9/16; P(X = 1) = 6/16 and P(X = 2) = 1/16. Therefore, the cumulative distribution function F(x) of X is 0 for x < 0, F(x) = 9/16 for 0 <= x < 1, F(x) = 15/16 for 1 <= x < 2 and F(x) = 1 for x >= 2.
• Sep 7th 2012, 06:04 PM
supermario88
Re: I'm confused between probability density function andcumulative distribution func
Hi emakarov, thanks very much for the detailed post.

"If U is uniformly distributed on [0, 1], then the probability that U belongs to some set E is the length (more precisely, measure) of E. In particular, the probability that 0 <= U <= sqrt(x) is sqrt(x) - 0 = sqrt(x)."

I think I understand but could you tell me how your read the following? This was the first line on example 2.13

Fx(x) = P(X<= x)

I read this as the probability that a random outcome X will be less than or equal to x. I'm thinking this isn't the correct way of interpreting this. Because when I read it like this then the statement P(U <= sqrt(x)) in english is, the probability that a random outcome U is less than or equal to the square root of x. Is there a better way of saying this that incorporates what you said in your answer? I thought that the distribution function is from negative infinity to x. But is it because we know the lower bound of U (namely 0) that we can ignore the negative infinity?

"I am not sure I understand your concern. This is like saying that we have the graph of y = x, but there is never a need to consider the graph of y = x^n because x^n is just a repeated integral of x times a constant."

I'm not sure I really follow. My point of confusion is this, say the density function is y= x where x is between 0 and 1 inclusively. Isn't the distribution function the right triangle formed with the hypotenuse of y=x and one leg of the triangle formed from the x-axis. Isn't the area inside this triangle called the distribution function? I know that I can take the integral between 0 and 1 of y=x and get the area that way. But why would I plot ((x^2)/2)? We're not trying to find the area under this curve. This is why I said

"For the same example, I don't understand why the graph of the probability density function (f(x)) is so different from the distribution function (F(x))."

And also said

"So why wouldn't we only have one graph (just the density function graph) and say that the total area under the curve is the cumulative distribution function."

"Provided the two tosses are independent, P(X = 0) = 9/16; P(X = 1) = 6/16 and P(X = 2) = 1/16. Therefore, the cumulative distribution function F(x) of X is 0 for x < 0, F(x) = 9/16 for 0 <= x < 1, F(x) = 15/16 for 1 <= x < 2 and F(x) = 1 for x >= 2. "

Did you mean that F(x) = 7/16 for 1<= x < 2? I got this from P(X=1) + P(X=2). Could you also explain how did you get P(X = 0) = 9/16?

Thanks very much.
• Sep 8th 2012, 04:00 AM
emakarov
Re: I'm confused between probability density function andcumulative distribution func
Quote:

Originally Posted by supermario88
could you tell me how your read the following? This was the first line on example 2.13

Fx(x) = P(X<= x)

I read this as the probability that a random outcome X will be less than or equal to x.

This is the correct interpretation.

Quote:

Originally Posted by supermario88
I'm thinking this isn't the correct way of interpreting this. Because when I read it like this then the statement P(U <= sqrt(x)) in english is, the probability that a random outcome U is less than or equal to the square root of x. Is there a better way of saying this that incorporates what you said in your answer? I thought that the distribution function is from negative infinity to x. But is it because we know the lower bound of U (namely 0) that we can ignore the negative infinity?

Yes, 0 <= U <= 1, so P(U <= x) = P(0 <= U <= x).

Quote:

Originally Posted by supermario88
I'm not sure I really follow. My point of confusion is this, say the density function is y= x where x is between 0 and 1 inclusively.

This is not a density function because the area under it is 1/2, not 1.

Quote:

Originally Posted by supermario88
Isn't the area inside this triangle called the distribution function? I know that I can take the integral between 0 and 1 of y=x and get the area that way. But why would I plot ((x^2)/2)? We're not trying to find the area under this curve.

So, you are saying that the area is the cumulative distribution function, but we are not trying to find the area. Hmm...

Quote:

Originally Posted by supermario88
This is why I said

"For the same example, I don't understand why the graph of the probability density function (f(x)) is so different from the distribution function (F(x))."

So the reason you don't understand why F(x) and f(x) look so different is that you don't think we should find F(x). Hmmmm...

Quote:

Originally Posted by supermario88
And also said

"So why wouldn't we only have one graph (just the density function graph) and say that the total area under the curve is the cumulative distribution function."

As a general remark, mathematicians have the right to consider any objects as long as they are well-defined and do not lead to a contradiction. You would probably be surprised to find out how obscure, weird and lacking connection with real life some topics may look, and nevertheless, some mathematicians devote a great deal of energy studying them. You are wondering why it makes sense to plot a probability density and its antiderivative (i.e., the cumulative distribution function) on the same graph. The simple answer is that this is a legitimate activity, and who knows what insight it may produce. For example, it may make clear to the reader that whereas the distribution function is monotonic, the density function does not have to be. You are right in the sense to F(x) can be reconstructed from f(x) and in this sense it is superfluous information, but considering them together only increases our understanding of the topic. After all, when a point is moving along a line, its position is the antiderivative of its velocity. This does not imply that position is uninteresting or should not be plotted together with the velocity, does it? (It does not imply that the graphs of positiion and velocity should look alike, either.)

That said, I think the best way to clear a misunderstanding is to go more formal. The question "Why do we plot this function?" is not a mathematical one, neither are "Why is it interesting?"or "Why should I consider it?" The proper mathematical question is "Why is this statement true?"

Quote:

Originally Posted by supermario88