# Thread: chi square, df and constraints...

1. ## chi square, df and constraints...

Hi, I'm trying to understand a detail about the chi-square, dfs and constraints.

I saw that the chi_square with N degrees of freedom is simply the distribution of the sum of the squares of N indipendent standard normal variables (i.e. normal variables with zero mean and unit variance).

To find its expression, you calculate first the distribution of the square of a single standard normal variable, then you calculate its fourier/Laplace transform, power it to the N (since the distribution of the sum of indipendent variables is the convolution of the singular distributions i.e. product of the transforms) and then calculate the inverse transform.

But what if there are constraints, i.e. these variables are not completely indipendent??

Suppose we have a vector of N indipendent standard normal variables (x_i), and that we impose later k indipendent constraints, which can be expressed as k different equations:

g_1(x_1,x_2,...,x_N)=0
g_2(x_1,x_2,...,x_N)=0
...
g_k(x_1,x_2,...,x_N)=0

I have seen that in these cases, instead of calculating the chi^2 statistic as:

chi^2=SUM[(x_i^2)]

it is calculated instead with "weights" w_i:

chi^2=SUM[w_i*(x_i^2)]

w_i are chosen with this conditions:

w_i>0;

SUM[w_i]=N-k

This automatically implies that:

E[chi^2]=N-k

(But how are these single w_i chosen?)

Then, the chi^2 statistic will follow the chi_square distribution with N-k degrees of freedom.

Where can I find an explanation of this?

And what changes if these N standard normal variables are not indipendent?

Thanks!

2. Originally Posted by rargh
Hi, I'm trying to understand a detail about the chi-square, dfs and constraints.

I saw that the chi_square with N degrees of freedom is simply the distribution of the sum of the squares of N indipendent standard normal variables (i.e. normal variables with zero mean and unit variance).

To find its expression, you calculate first the distribution of the square of a single standard normal variable, then you calculate its fourier/Laplace transform, power it to the N (since the distribution of the sum of indipendent variables is the convolution of the singular distributions i.e. product of the transforms) and then calculate the inverse transform.

But what if there are constraints, i.e. these variables are not completely indipendent??

Suppose we have a vector of N indipendent standard normal variables (x_i), and that we impose later k indipendent constraints, which can be expressed as k different equations:

g_1(x_1,x_2,...,x_N)=0
g_2(x_1,x_2,...,x_N)=0
...
g_k(x_1,x_2,...,x_N)=0

I have seen that in these cases, instead of calculating the chi^2 statistic as:

chi^2=SUM[(x_i^2)]

it is calculated instead with "weights" w_i:

chi^2=SUM[w_i*(x_i^2)]

w_i are chosen with this conditions:

w_i>0;

SUM[w_i]=N-k

This automatically implies that:

E[chi^2]=N-k

(But how are these single w_i chosen?)

Then, the chi^2 statistic will follow the chi_square distribution with N-k degrees of freedom.

Where can I find an explanation of this?

And what changes if these N standard normal variables are not indipendent?

Thanks!
Hi. I cannot make sense of what you are saying about constraints.

Suppose you have two independent standard normal variables X1 and X2. You later impose a single constraint g(X1,X2) = X1 + X2 = 0. Since X1 and X2 are still standard normal and you haven't talked about any transformation of them, the constraint may or may not be satisfied for any particular realization of (X1,X2). In fact, the constraint will not be satisfied with probability 1.

So how is the distribution of w1*X1^2 + w2*X2^2 calculated? Is conditional on the zero probability event that the constraint is satisfied? Hmm, that doesn't sound right.

I've seen constraints imposed in statistical procedures where the chi^2 statistic is modified, e.g., ordinary least squares subject to constraints on the coefficients. But the constraints are not applied as you describe.

Please give a specific, concrete example of a statistical procedure that does what you describe.

3. Originally Posted by JakeD
Hi. I cannot make sense of what you are saying about constraints.

Suppose you have two independent standard normal variables X1 and X2. You later impose a single constraint g(X1,X2) = X1 + X2 = 0.
They are not supposed to be independent (that must be a typo), the
constraint/s effectivly allows you to replace n normal RV's by n-k independent
normal RV's (at least linear constraints should, the thought nonlinear constraints
makes my brain hurt, but then most things do these days I blame my ***** *****).

I've not looked at this in detail, so of course all of what I have said may be
nonsense.

RonL

4. Originally Posted by JakeD
Hi. I cannot make sense of what you are saying about constraints.

Suppose you have two independent standard normal variables X1 and X2. You later impose a single constraint g(X1,X2) = X1 + X2 = 0. Since X1 and X2 are still standard normal and you haven't talked about any transformation of them, the constraint may or may not be satisfied for any particular realization of (X1,X2). In fact, the constraint will not be satisfied with probability 1.

So how is the distribution of w1*X1^2 + w2*X2^2 calculated? Is conditional on the zero probability event that the constraint is satisfied? Hmm, that doesn't sound right.

I've seen constraints imposed in statistical procedures where the chi^2 statistic is modified, e.g., ordinary least squares subject to constraints on the coefficients. But the constraints are not applied as you describe.

Please give a specific, concrete example of a statistical procedure that does what you describe.
Originally Posted by CaptainBlack
They are not supposed to be independent (that must be a typo), the
constraint/s effectivly allows you to replace n normal RV's by n-k independent
normal RV's (at least linear constraints should, the thought nonlinear constraints
makes my brain hurt, but then most things do these days I blame my ***** *****).

I've not looked at this in detail, so of course all of what I have said may be
nonsense.

RonL
The two-variable example was just what rargh described. There was nothing about variables being dependent or being replaced.

So going with my example: there are two standard normal variables X1 and X2 obeying the constraint X1 + X2 = 0. Then they are obviously dependent because X1 = -X2. The joint density of dependent normal variables is determined by the vector of means and the covariance matrix. For (X1,X2), the means are (0,0) and the covariance matrix is
Code:
 1 -1
-1  1
How would this be generalized? What would the n-vector of means and the nxn covariance matrix be of n dependent normal variables that satisfy k linear constraints?

Now if the goal were to give the means and covariance matrix of k linear functions of n independent variables ~ N(mu_i,sigma), that is well-known. Let B be the transformation matrix. Then the k-vector of means is Bmu and the kxk covariance matrix is (BB')^{-1}. But I haven't seen something like this in terms of constraints.

Can you give a statistical procedure where linear constraints rather than linear functions are used in the analysis? Thanks.

5. Ok here is the specific example that started my doubts:

the chi-square based goodness-of-fit test.

We have a sample of N observations of the same r.v., we divide the real line into M adjacent intervals, and we compare the number of observations for each interval O_i with the expected number E_i, given by:

E_i=N*p_i, where p_i is calculated by integrating the theoretical distribution in that interval.

The test statistic used is:

[1a] chi^2=SUM[(O_i-E_i)^2/E_i], which, expressed in the same way as in my first post, can be rewritten as:

[1b] chi^2=SUM[(x_i^2)/var_i*w_i]

where:

[2] var_i=n*p_i*(1-p_i)=E_i*(1-p_i)

and

[3] w_i=(1-p_i)

I think the correct variance var_i should be given by [2] because if we study the single r.v. O_i, we see that each follows a binomial distribution with parameters N and p_i.

Now the chi^2 statistic should be approximately distributed as a chi_square with M-1 degrees of freedom (supposing that we know the exact p_i i.e. it is not estimated).

The only constraint here is:

[4] SUM(x_i)=0

which is in fact linear.

I have checked after you reminded that the x_i aren't indipendent, and in fact the covariance should be:

[5] COV(x_i,x_j)=-N*p_i*p_j

for i!=j

This makes sense, since adding the covariances and the variances gives a total of 0.

Ok, now, how do I apply this to our example, in finding the w_i ?

If I understood well, you say that if we have a set of M normally distributed random variables, satisfying a linear constraint and therefore not indipendent, we can find a linear transformation so that we obtain M-1 indipendent n.d.r.v ?

Thanks

6. Originally Posted by JakeD
The two-variable example was just what rargh described. There was nothing about variables being dependent or being replaced.
There may not have been mention of them being dependent, but I'm
pretty sure they cannot be independent (in fact that is what you are
pointing out).

Then replacing them by a reduced number of independent RV's is just a
trick to get the given result (I'm guessing here of course as I have not
worked through the detail).

RonL

7. Originally Posted by rargh
Ok here is the specific example that started my doubts:

the chi-square based goodness-of-fit test.

We have a sample of N observations of the same r.v., we divide the real line into M adjacent intervals, and we compare the number of observations for each interval O_i with the expected number E_i, given by:

E_i=N*p_i, where p_i is calculated by integrating the theoretical distribution in that interval.

The test statistic used is:

[1a] chi^2=SUM[(O_i-E_i)^2/E_i], which, expressed in the same way as in my first post, can be rewritten as:

[1b] chi^2=SUM[(x_i^2)/var_i*w_i]

where:

[2] var_i=n*p_i*(1-p_i)=E_i*(1-p_i)

and

[3] w_i=(1-p_i)

I think the correct variance var_i should be given by [2] because if we study the single r.v. O_i, we see that each follows a binomial distribution with parameters N and p_i.

Now the chi^2 statistic should be approximately distributed as a chi_square with M-1 degrees of freedom (supposing that we know the exact p_i i.e. it is not estimated).

The only constraint here is:

[4] SUM(x_i)=0

which is in fact linear.

I have checked after you reminded that the x_i aren't indipendent, and in fact the covariance should be:

[5] COV(x_i,x_j)=-N*p_i*p_j

for i!=j

This makes sense, since adding the covariances and the variances gives a total of 0.

Ok, now, how do I apply this to our example, in finding the w_i ?
Here is a summary of the derivation of the Asymptotic Distribution (A.D.) for the chi^2 goodness of fit test. It's from C. Rao's Linear Statistical Inference and Its Applications, pp. 382-91. This text is considered definitive by statisticians.

Let there be k cells and N observed events. Let Ni be the number of events landing in cell i and Pi be the probability an event lands in cell i. Define the row k-vector v with elements (Ni - NPi)/sqrt(NPi) and the row k-vector f with elements sqrt(Pi). Let I(k) be the k x k identity matrix. Let b' denote the transpose of a vector b.

The statistic v'v is the chi^2 statistic and it is to be proved that v'v has an asymptotic chi^2 distribution with D.F. k-1.

i) Let b be a fixed row k-vector. The A.D. of b'v is N(0,b'(I(k)-ff')b).

This is proved using the central limit theorem.

ii) Let B be a k x p matrix of rank p. The A.D. of B'v is multivariate normal Np(0,B'(I(k)-ff')B).

Take B = I(k). Then (ii) says the A.D. of v is multivariate normal Nk(0,I(k) - ff'). The dependency among the elements of v shows up in the covariance matrix I(k) - ff', which has rank k-1.

iii) Let A be a k x k-1 matrix such that the partitioned matrix [f|A] is orthonormal. Then the A.D. of the (k-1)-vector g(v) = A'v is N(k-1)(0,I(k-1)), that is, a (k-1)-vector of independent standard normal variables.

iv) Write v'v = g(v)'A'Ag(v). Since asymptotically g(v) is (k-1)-vector of independent standard normal variables, the quadratic form g(v)'A'Ag(v) has an asymptotic chi^2 distribution with D.F. equal to the rank of A'A = I(k-1), which rank is k-1. Q.E.D.

There is no mention in any of this of linear constraints or replacing variables by a reduced set using constraints. But there is considerable use of linear functions and covariance matrices and the ranks of those matrices.

Originally Posted by rargh
If I understood well, you say that if we have a set of M normally distributed random variables, satisfying a linear constraint and therefore not indipendent, we can find a linear transformation so that we obtain M-1 indipendent n.d.r.v ?
I didn't say that. I was asking CaptainBlack how we would find M dependent normal variables satisfying k constraints. I haven't seen any use of constraints like you and CaptainBlack describe.

8. I studied this problem the whole afternoon. I understood some interesting things.

Starting with a generic n-vector "x" of r.v., whose variance-covariance matrix is C, with rank(C)=(n-k) .

It can be proved that:

let U be the intersection of all the possible ker(xx') (depending on the randomly chosen x).

(It can be easily proven that U is a vector space)

We want to prove that

[1] U=Ker(C)

[1a] let v be a vector in U. Then v is in Ker(C) as well. (this is obvious)
[1b] let v be a vector in Ker(C). Then v is in U as well.

Proof of [1b]:

Let v be a vector in Ker(C).

Let y=v'x=x'v

E[y^2]=(v'x)^2]=E[v'xx'v]=v'Cv=0

If E[y^2]=0, then y=0 for every x. Therefore x'v=0 for every x, which implies that v is in U.

Q.E.D.

dim(U)=k, rank(C)=n-k

Another easy hypothesis to prove:

[2] if v is in U then v'x=x'v=0 for every x and viceversa.

xx'v=0

x(v'x)'=0

xy'=0

if x is not a null vector, then the matrix xy' is null i.i.f y=0.

Q.E.D.

So we see that if rank(C)<n, there is multicollinearity in these random variables (equivalent to the "constraints" I tried to impose previously.

9. Another doubt I had, which I couldn't resolve.

We have this variance-covariance matrix C, which is by definition positive semidefinite.

Sylvester's theorem on scalar products tells us that we can find a nonsingular matrix M such that:

[1] M'CM=S(i_+,i_-,i_0)

i.e. S is a diagonal matrix with i_+ elements = +1, i_- elements = -1, and i_0 elements=0

(i_+)+(i_-)+(i_0)=N

if rank(C)=N-k,

then:

i_+=N-k
i_-=0

i_0=k

We can also use the spectral theorem, which states that every real symmetric matrix C can be diagonalized in a base of orthogonal eigenvectors (orthogonal according to the euclidean scalar product, that is).

So there is a nonsingular matrix P such that:

[2] P^(-1)CP=D

where D is a diagonal matrix.

D has (n-k) positive eigenvalues and k null eigenvalues.

Now, since in our goodness-of-fit test example we had:

chi^2=x'Ex

where E is a diagonal matrix,

can we use all what we have found to see if:

[3] M'M=E

?

Thanks...