How many data are enough to determine a probability distribution?
The probability distribution is more correct if my dataset encompass 200 data than with 20 points.
So the answer may be the more the better. But is there any conventional thought?
Well, yeah, I can use K-S test. But K-S test can be different when sample size is small or large. For example:
>>x=1:1:30; ' Generating 30 points for x.
>> y=raylpdf(x,1); ' assuming a Rayleigth distribution for y.
>> alam=gamfit(y); ' but I fit y with gamma distribution
>> [h,p]=kstest(y,[y gamcdf(y,alam(1),alam(2))],0.05) ' K-S test for y
0.9343 ' pretty good. y is a gamma distribution
>> x=1:.01:30; ' Generating 2901 points for x.
>> y=raylpdf(x,1); ' assuming a Rayleigth distribution for y
>> [h,p]=kstest(y,[y gamcdf(y,alam(1),alam(2))],0.05)
9.0404e-013 ' y is not a gamma distribution
Of course, it is just a ideal exmaple. If we go to real data observed from experiment, the thing will be even more complex. It make me think whether conclusion can be wrong just because we did not choose the right probability distribution due to limitation of sample size?
So my question is if there is a conventional idea regarding it?
Like most people think two datasets is statistically significant if P<0.05. Why is the criterion not 0.01 or 0.1? Just because most people follow the rule. Welcome your idea!
In practice, and ideally, someone running statistics on observed data ought to declare, a priori, what they will use as their threshold for significance, taking into account as much of the relevant circumstances that surround the data as possible--sample size, number/nature of questions/tests being thrust at the data set, etc. It seems that the .05 level is by far the convention because if the theories from which you derive your questions are reasonable, even with a modest sample size it is pretty safe to assume whatever results you obtain (barring major design flaws, data loss, massive violations to the assumptions, etc) are, in fact, valid.
I'm probably just re-hashing things you've heard or been advised on or thought to yourself already, but there it is.