What would be the expression for the cumulative distribution function of the generalized gaussian distribution. The PDF of distribution is given by:
where
and
Thanks
What would be the expression for the cumulative distribution function of the generalized gaussian distribution. The PDF of distribution is given by:
where
and
Thanks
After some googling, I have not found anything better than that.
You can write down the expression in terms of the generalized error function (see this) but in the end you still have the same integral at the heart of it. The fact is that you cannot find an expression that doesn't contain either an integral or an infinite series for the general case. For certain values of c (like 1), you could obviously write down a closed form expression (though I think it might have to have to be broken up into 2 expressions, one if and one for )
Even if you look at c=2, which is Gaussian (see this) they write the CDF in terms of the error function which is an integral expression.
Thanks for the reply.
I also got the same integral:
but didn't know (and still don't) what do with it
Actually, I'm trying to get goodness-of-fit of the empyrical data to a GGDs with different shape parameters c.
The Kolmogorov-Smirnov test needs the empyrical and the distribution CFD .
In Matlab (and in general) it is easy to find the empyrical CDF of the given data and evaluate it at each sample, but how do I get the value of the GGD CDF?
What you need to do is numerically evaluate the integral. There are a few ways of doing this in matlab. There is the "quad" family of functions. I have had problems with them when you want infinite bnds and it won't necessarily create a pdf that is monotonically increasing (since it approximates the function, and then integrates).
So probably the best way (?) is just numerically sample the function over reasonable bnds and a small spacing. Then just do a trapazoidal numeric integration.
So first figure out some reasonable bnds. From your formulation, you seem to know a priori what the standard deviation is .
So let
By Chebchev's inequality, you are guaranteed to miss at most 1% of the total area under the curve by using this as a bound. For most distributions, it is substantially better than that. You may want to crank that sucker down to 4 or 5, say, rather than 10.
You will be able to tell how well you did by looking at 1-cdf(end). If that is very small, then chances are you have a good sampling of the pdf. If you don't want your cdf to be quite that big you still need to calculate the cdf over a big range and small spacing (as I have done) and then you can downsample.Code:x = linspace(-B,B,10000); pdf = a*exp(-(b*x).^c); % perform trapazoidal cumulative integration cdf = cumtrapz(x,pdf);
For example:
Code:x_small = -B:0.05:B; cdf_small = interp1(x,cdf,x_small,'linear','extrap');
Great, the first code snippet was exactly what I needed!
As for the 1-cdf(end) part, I'm not sure you are correct. The KS test searches for the
which is probably somewhere near the middle of the 0-0.5 or 0.5-1 ranges of cdf values.
In general, if my cdf is anything even close to gaussian it should have no problem to come very close to 1 at cdf(end), and I expect the 1-cdf(end) to always be (for a reasonable paramaters of GGD) very close to 0. Please, correct me if I'm wrong.
For the last advice on the topic (or a bit off topic), the test needs distributions pdfs. I suppose it should be fine to use
to estimate the pdf of the values in Y?Code:[pdf,x]=ksdensity(Y);
Sorry, this was meant to be a check just on how good the numerical integration approximation was. We are discretely sampling the PDF, and then doing Riemann Sums as the approximation to the integral to get the CDF. If we undersampled the PDF then cdf(end) may not be very close to 1. I wasn't referencing the kstest.
The last part of my post was referring to the fact that maybe you didn't want to have such a finely resolved CDF. If that was the case, then I was showing how you might downsample it.
I assume your Y is the data? I guess the tests that I am familiar with you don't need to do any kernel smoothing of the data. You would just bin the data and "bin" the PDF (take the difference of the endpoints of the CDF for each bin), and do the test. It doesn't look like the ksdensity would be necessary.
If Y is the CDF that we just calculated, I'm not sure why any kernel smoothing would be necessary either.
But I should add the disclaimer that I have not done much on this part of statistics. I have used both kstest and , but I have never done any kernel smoothing. I would think kernel smoothing would be useful for visualization, but not really for trying to perform hypothesis tests comparing empirical data to a given distribution.
I misunderstood you. Now it makes sense.Sorry, this was meant to be a check just on how good the numerical integration approximation was...
For the test, you are, of course, right again. Binned data is what is used in the test so the cdf differences will do it.
Thanks for the help, I think I've finally got things straightened out