
Originally Posted by
rargh
Hi, I'm trying to understand a detail about the chi-square, dfs and constraints.
I saw that the chi_square with N degrees of freedom is simply the distribution of the sum of the squares of N indipendent standard normal variables (i.e. normal variables with zero mean and unit variance).
To find its expression, you calculate first the distribution of the square of a single standard normal variable, then you calculate its fourier/Laplace transform, power it to the N (since the distribution of the sum of indipendent variables is the convolution of the singular distributions i.e. product of the transforms) and then calculate the inverse transform.
But what if there are constraints, i.e. these variables are not completely indipendent??
Suppose we have a vector of N indipendent standard normal variables (x_i), and that we impose later k indipendent constraints, which can be expressed as k different equations:
g_1(x_1,x_2,...,x_N)=0
g_2(x_1,x_2,...,x_N)=0
...
g_k(x_1,x_2,...,x_N)=0
I have seen that in these cases, instead of calculating the chi^2 statistic as:
chi^2=SUM[(x_i^2)]
it is calculated instead with "weights" w_i:
chi^2=SUM[w_i*(x_i^2)]
w_i are chosen with this conditions:
w_i>0;
SUM[w_i]=N-k
This automatically implies that:
E[chi^2]=N-k
(But how are these single w_i chosen?)
Then, the chi^2 statistic will follow the chi_square distribution with N-k degrees of freedom.
Where can I find an explanation of this?
And what changes if these N standard normal variables are not indipendent?
Thanks!