# Why is there perfect multicollinearity in this regression?

• Oct 3rd 2012, 04:10 PM
salohcin
Why is there perfect multicollinearity in this regression?
Hi Everybody,
I will simplify my problem to make this all more clear. Suppose I have three countries (1, 2 and 3) and I have data on migration between these countries for a single year. Let Mij be my dependent variable which is the migration from country i (the origin) to country j (the destination). So I would have
M12 M13 M21 M23 M31 M32
I also have the gross domestic product of each country (call this GDP). I want to run the regression
Mij = a + B1 GDPi + B2 GDPj + e
This is fine. But I also want to include country specific intercepts (each origin country will have its own intercept). So I want to estimate
Mij = ai + B1 GDPi + B2 GDPj + e
I will dummy variables for all 3 countries d1, d2, d3 (I get rid of the constant so there is no dummy variable trap). So my regression is
Mij = d1 + d2 + d3 + B1 GDPi + B2 GDPj + e
Even after having getting rid of the constant, I am still having a problem of perfect multicollinearity. Can anybody explain why this is? If I get rid of GDPi there is no longer a problem of perfect multicollinearity. I have a very weak understanding of matrix algebra (so if it can be dumbed down at all that would be great).
Any help would be enormously appreciated
• Oct 3rd 2012, 08:35 PM
chiro
Re: Why is there perfect multicollinearity in this regression?
Hey salohcin.

If you want to introduce categorical intercepts, then it's best if you use dummy variables but not in the way you are doing it.

For example if you have three countries you use two dummy variables that have the following properties:

B0 - Normal intercept for first category where B1 and B2 are both zero
B1 - Normal intercept for second category where B2 = 0
B2 - Normal intercept for third category where B1 = 0.

Basically the way it works is this:

The model component for this qualitative variable is B0 + B1A1 + B2A2

First category - B0 represents intercept component for 1st category where B1 and B2 are both zero
Second category B0 + B1 represents intercept component for 2nd category where B1 = 1 but B2 = 0
Third category B0 + B2 represents intercept component for 3rd category where B2 = 1 but B1 = 0.

So try changing your second model into the above (ai into A0 + B1A1 + B2A2 where first category means A1 = 0, A2 = 0; second category means A1 = 1, A2 = 0; third category is A1 = 1, A2 = 1) and see what happens.

If these categories are independent, it should go well.