# Thread: Help with identification of probability distribution?

1. ## Help with identification of probability distribution?

Hi there,

I've been struggling for identification of probability distribution of my sample. Any enlightenments would be appreciated.

Now I have data with 450 of sample size. I want to know which probability distribution the data is. But the data failed to pass any test (e.g. jarque-Bera, Komogorov-smirnov).

Thus, what is the first step i can consider if the data cannot fit any probability distribution? Or I can trasform my data and get identity that could fit in some probability distribution?

let me know if you cannot understand the question, English is not my mother language.

2. Originally Posted by zhangty
Hi there,

I've been struggling for identification of probability distribution of my sample. Any enlightenments would be appreciated.

Now I have data with 450 of sample size. I want to know which probability distribution the data is. But the data failed to pass any test (e.g. jarque-Bera, Komogorov-smirnov).

Thus, what is the first step i can consider if the data cannot fit any probability distribution? Or I can trasform my data and get identity that could fit in some probability distribution?

let me know if you cannot understand the question, English is not my mother language.
1. What distributions have you tested it against?

2. What process is producing this sample (if we know the process we have a better idea of what kind of distribution we may be dealing with)

3. Do you have any reason to expect that the distribution is some standard distribution?

CB

3. Thanks for the promote reply.

1. I tested normal, Rayleigh, Chi Square distribution using jarque-Bera and Komogorov-smirnov technique in MATLAB (using jbtest and kstest). The distribution look like Rayleigh distribution but it still cannot pass the test.

2. My sample was not produced by computer. It was observed. I will explain in advance below.

3. I want to know with how much probabilities climate model can simulate rainfall data. So I calculated simulated error percent (error percent= (Rain_simulated - Rain_observed)/Rain_observed*100%). And now i have a sample with 450 of sample size. I can quantify the accuracy of simulation like how much the probability under simulated error is between 10% - 20%, P(10%<error<20%), if the errors are some standard distribution.

Thanks again!

4. Originally Posted by zhangty

1. I tested normal, Rayleigh, Chi Square distribution using jarque-Bera and Komogorov-smirnov technique in MATLAB (using jbtest and kstest). The distribution look like Rayleigh distribution but it still cannot pass the test.

2. My sample was not produced by computer. It was observed. I will explain in advance below.

3. I want to know with how much probabilities climate model can simulate rainfall data. So I calculated simulated error percent (error percent= (Rain_simulated - Rain_observed)/Rain_observed*100%). And now i have a sample with 450 of sample size. I can quantify the accuracy of simulation like how much the probability under simulated error is between 10% - 20%, P(10%<error<20%), if the errors are some standard distribution.

Thanks again!
Are you using the absolute value of the percentage error? I ask because the Rayleigh is defined (or rather non-zero) on the positive half line.

CB

5. Originally Posted by CaptainBlack
Are you using the absolute value of the percentage error? I ask because the Rayleigh is defined (or rather non-zero) on the positive half line.

CB
When I tested I transformed the value of error to positive by equation (error_tran=error+abs(min(error)+1), so error_tran would be no less than 1.

The picture was as followed. The parameter of Rayleigth distribution (51.874) was obtained by raylfit(error_trans) in MATLAB. Then I tested it using [h,s]=kstest(error_trans,[error_trans raylcdf(error_trans,51.874)],0.05). It said h=1, rejecting the null hypothesis.

6. Originally Posted by zhangty
When I tested I transformed the value of error to positive by equation (error_tran=error+abs(min(error)+1), so error_tran would be no less than 1.

The picture was as followed. The parameter of Rayleigth distribution (51.874) was obtained by raylfit(error_trans) in MATLAB. Then I tested it using [h,s]=kstest(error_trans,[error_trans raylcdf(error_trans,51.874)],0.05). It said h=1, rejecting the null hypothesis.

I would suggest you look at fitting error to a weighted sum of three Gaussian components.

CB