Results 1 to 5 of 5

Math Help - How many points are enough to determine a probability distribution?

  1. #1
    Newbie
    Joined
    Nov 2009
    From
    Beijing
    Posts
    12

    Question How many points are enough to determine a probability distribution?

    How many data are enough to determine a probability distribution?

    The probability distribution is more correct if my dataset encompass 200 data than with 20 points.

    So the answer may be the more the better. But is there any conventional thought?

    Thanks
    Follow Math Help Forum on Facebook and Google+

  2. #2
    Master Of Puppets
    pickslides's Avatar
    Joined
    Sep 2008
    From
    Melbourne
    Posts
    5,236
    Thanks
    28
    Quote Originally Posted by zhangty View Post
    So the answer may be the more the better.
    I would go with this thought as well.

    I would suggest 30 points will give some shape to a probabilty distribution.
    Follow Math Help Forum on Facebook and Google+

  3. #3
    MHF Contributor matheagle's Avatar
    Joined
    Feb 2009
    Posts
    2,763
    Thanks
    5
    all?

    But you can use the Kolmogorov-Smirnov test to see what the underlying distribution might be.
    Follow Math Help Forum on Facebook and Google+

  4. #4
    Newbie
    Joined
    Nov 2009
    From
    Beijing
    Posts
    12
    Quote Originally Posted by matheagle View Post
    all?

    But you can use the Kolmogorov-Smirnov test to see what the underlying distribution might be.

    Well, yeah, I can use K-S test. But K-S test can be different when sample size is small or large. For example:
    Example 1:
    >>x=1:1:30; ' Generating 30 points for x.
    >> x=x';
    >> y=raylpdf(x,1); ' assuming a Rayleigth distribution for y.
    >> alam=gamfit(y); ' but I fit y with gamma distribution
    >> [h,p]=kstest(y,[y gamcdf(y,alam(1),alam(2))],0.05) ' K-S test for y
    h =
    0

    p =
    0.9343 ' pretty good. y is a gamma distribution
    ------------------------
    Example 2:
    >> x=1:.01:30; ' Generating 2901 points for x.
    >> x=x';
    >> y=raylpdf(x,1); ' assuming a Rayleigth distribution for y
    >> alam=gamfit(y);
    >> [h,p]=kstest(y,[y gamcdf(y,alam(1),alam(2))],0.05)
    h =
    1

    p =
    9.0404e-013 ' y is not a gamma distribution
    -----------------------
    Of course, it is just a ideal exmaple. If we go to real data observed from experiment, the thing will be even more complex. It make me think whether conclusion can be wrong just because we did not choose the right probability distribution due to limitation of sample size?

    So my question is if there is a conventional idea regarding it?

    Like most people think two datasets is statistically significant if P<0.05. Why is the criterion not 0.01 or 0.1? Just because most people follow the rule. Welcome your idea!
    Follow Math Help Forum on Facebook and Google+

  5. #5
    Junior Member
    Joined
    Nov 2009
    Posts
    54
    Quote Originally Posted by zhangty View Post

    So my question is if there is a conventional idea regarding it?

    Like most people think two datasets is statistically significant if P<0.05. Why is the criterion not 0.01 or 0.1? Just because most people follow the rule. Welcome your idea!
    Conventions depend on who you're holding yourself accountable to. When you say that "most people think" p< .05 is significant, your "most people" are part of a particular audience. Sometimes the criterion will be p < .01, or p< 0.1--I've even read published results where a researcher reported that "[such-and-such] was marginally significant (p<.15)". I've only seen that one a few times, but its not terribly uncommon for one to report a result significant at the 0.1 level as "marginally significant" where the convention in the field is to use .05 as the threshold.

    In practice, and ideally, someone running statistics on observed data ought to declare, a priori, what they will use as their threshold for significance, taking into account as much of the relevant circumstances that surround the data as possible--sample size, number/nature of questions/tests being thrust at the data set, etc. It seems that the .05 level is by far the convention because if the theories from which you derive your questions are reasonable, even with a modest sample size it is pretty safe to assume whatever results you obtain (barring major design flaws, data loss, massive violations to the assumptions, etc) are, in fact, valid.

    I'm probably just re-hashing things you've heard or been advised on or thought to yourself already, but there it is.
    Follow Math Help Forum on Facebook and Google+

Similar Math Help Forum Discussions

  1. [SOLVED] Determine points of continuity
    Posted in the Differential Geometry Forum
    Replies: 2
    Last Post: July 13th 2011, 06:36 PM
  2. determine coplanar points
    Posted in the Calculus Forum
    Replies: 2
    Last Post: April 25th 2010, 03:00 AM
  3. Determine the Points
    Posted in the Pre-Calculus Forum
    Replies: 1
    Last Post: December 14th 2009, 12:42 PM
  4. Determine the coordinates of the points.
    Posted in the Geometry Forum
    Replies: 1
    Last Post: August 31st 2009, 11:32 PM
  5. determine angle of 3 points
    Posted in the Geometry Forum
    Replies: 1
    Last Post: May 22nd 2009, 09:48 PM

Search Tags


/mathhelpforum @mathhelpforum