# Thread: CDF of generalized gaussian distribution

1. ## CDF of generalized gaussian distribution

What would be the expression for the cumulative distribution function of the generalized gaussian distribution. The PDF of distribution is given by:
$
f(x)=a e^{-|bx|^c},
$

where
$
a=\frac{bc}{2\Gamma(\frac{1}{c})}
$

and
$
b=\frac{1}{\sigma_x} \sqrt{\frac{\Gamma(\frac{3}{c})}{\Gamma(\frac{1}{c }) }}
$

Thanks

2. $\int^{x}_{-\infty}a\exp(-|b t|^c)\, dt$

You can write down the expression in terms of the generalized error function (see this) but in the end you still have the same integral at the heart of it. The fact is that you cannot find an expression that doesn't contain either an integral or an infinite series for the general case. For certain values of c (like 1), you could obviously write down a closed form expression (though I think it might have to have to be broken up into 2 expressions, one if $x<0$ and one for $x\geq 0$)

Even if you look at c=2, which is Gaussian (see this) they write the CDF in terms of the error function which is an integral expression.

I also got the same integral:
$
F(x)=\int^{x}_{-\infty}f(x)dx=\int^x_{-\infty}a\exp(-|b x|^c)\, dx
$

but didn't know (and still don't) what do with it

Actually, I'm trying to get goodness-of-fit of the empyrical data to a GGDs with different shape parameters c.
The Kolmogorov-Smirnov test needs the empyrical $F_x(t)$ and the distribution CFD $F(t)$.
In Matlab (and in general) it is easy to find the empyrical CDF of the given data and evaluate it at each sample, but how do I get the value of the GGD CDF?

4. What you need to do is numerically evaluate the integral. There are a few ways of doing this in matlab. There is the "quad" family of functions. I have had problems with them when you want infinite bnds and it won't necessarily create a pdf that is monotonically increasing (since it approximates the function, and then integrates).

So probably the best way (?) is just numerically sample the function over reasonable bnds and a small spacing. Then just do a trapazoidal numeric integration.

So first figure out some reasonable bnds. From your formulation, you seem to know a priori what the standard deviation is $\sigma_x$.

So let

$B = 10\, \sigma_x$

By Chebchev's inequality, you are guaranteed to miss at most 1% of the total area under the curve by using this as a bound. For most distributions, it is substantially better than that. You may want to crank that sucker down to 4 or 5, say, rather than 10.

Code:
x = linspace(-B,B,10000);
pdf = a*exp(-(b*x).^c);

% perform trapazoidal cumulative integration
cdf = cumtrapz(x,pdf);
You will be able to tell how well you did by looking at 1-cdf(end). If that is very small, then chances are you have a good sampling of the pdf. If you don't want your cdf to be quite that big you still need to calculate the cdf over a big range and small spacing (as I have done) and then you can downsample.

For example:
Code:
x_small = -B:0.05:B;
cdf_small = interp1(x,cdf,x_small,'linear','extrap');

5. Great, the first code snippet was exactly what I needed!

As for the 1-cdf(end) part, I'm not sure you are correct. The KS test searches for the
$
max |F_x(t) - F(t)|,
$

which is probably somewhere near the middle of the 0-0.5 or 0.5-1 ranges of cdf values.
In general, if my cdf is anything even close to gaussian it should have no problem to come very close to 1 at cdf(end), and I expect the 1-cdf(end) to always be (for a reasonable paramaters of GGD) very close to 0. Please, correct me if I'm wrong.

For the last advice on the topic (or a bit off topic), the $\chi^2$ test needs distributions pdfs. I suppose it should be fine to use
Code:
[pdf,x]=ksdensity(Y);
to estimate the pdf of the values in Y?

6. Originally Posted by dreamer1
As for the 1-cdf(end) part, I'm not sure you are correct.
Sorry, this was meant to be a check just on how good the numerical integration approximation was. We are discretely sampling the PDF, and then doing Riemann Sums as the approximation to the integral to get the CDF. If we undersampled the PDF then cdf(end) may not be very close to 1. I wasn't referencing the kstest.

The last part of my post was referring to the fact that maybe you didn't want to have such a finely resolved CDF. If that was the case, then I was showing how you might downsample it.

I assume your Y is the data? I guess the tests that I am familiar with $\chi^2$ you don't need to do any kernel smoothing of the data. You would just bin the data and "bin" the PDF (take the difference of the endpoints of the CDF for each bin), and do the $\chi^2$ test. It doesn't look like the ksdensity would be necessary.

If Y is the CDF that we just calculated, I'm not sure why any kernel smoothing would be necessary either.

But I should add the disclaimer that I have not done much on this part of statistics. I have used both kstest and $\chi^2$, but I have never done any kernel smoothing. I would think kernel smoothing would be useful for visualization, but not really for trying to perform hypothesis tests comparing empirical data to a given distribution.

7. Sorry, this was meant to be a check just on how good the numerical integration approximation was...
I misunderstood you. Now it makes sense.

For the $\chi^2$ test, you are, of course, right again. Binned data is what is used in the test so the cdf differences will do it.

Thanks for the help, I think I've finally got things straightened out

8. Excellent