Regression Models problem =_=

May 2010
2
0
hi guys! i need your help here..
i have raw sugar data (dependent variable) and as for independent variables, i have white sugar, white sugar premium, crude oil, ethanol and corn.. all these data are in USD/Metric tonne..
basically i want to create the best regression model to predict future prices.. i took the data from 2007 till now.. (may 2010)
the problem that i have here is that, when i test the correlation between all the independent variables toward dependant variable, the correlations between ethanol and raw sugar is quite high (0.7) but when i used all the data from 2007-2010, the correlation is very very low.. (0.04)
in certain years, ethanol n crude oil affect raw sugar price quite strong.. but some times, it doesn't affect at all. should i include this variable in my model?
if what i did was wrong, could you suggest what should i do to create the best model to predict raw sugar price? thanks a lot..
 

Prove It

MHF Helper
Aug 2008
12,883
4,999
hi guys! i need your help here..
i have raw sugar data (dependent variable) and as for independent variables, i have white sugar, white sugar premium, crude oil, ethanol and corn.. all these data are in USD/Metric tonne..
basically i want to create the best regression model to predict future prices.. i took the data from 2007 till now.. (may 2010)
the problem that i have here is that, when i test the correlation between all the independent variables toward dependant variable, the correlations between ethanol and raw sugar is quite high (0.7) but when i used all the data from 2007-2010, the correlation is very very low.. (0.04)
in certain years, ethanol n crude oil affect raw sugar price quite strong.. but some times, it doesn't affect at all. should i include this variable in my model?
if what i did was wrong, could you suggest what should i do to create the best model to predict raw sugar price? thanks a lot..
Have you tried drawing a scatterplot of the data to see if the relationship looks linear or if the correlation looks strong? It's quite possible (especially since you have not posted the data) that the best regression model may not be linear...
 
May 2010
7
0
You should keep in mind that when you are doing regression, you are fitting a model (or a theory) to a data. So, the results are more reliable and descriptive if you have some a priori reason to assume that some certain model would be appropriate and others not.

For example, it is known that certain chemical reactions follow certain form. Autocatalytic reactions go by a logistic model when reaction concentration is plotted against completion. Therefore, if you have measurement from such a reaction you can justifiably use a logistic model to regress that data to. But your data may fit some other model better that still isn't a good one. For example, it is possible to make an almost perfect linear fit with various data that is not linear. Your measurement data from the aforementioned autocatalytic reaction could fit a linear model better, if, say, the data points were chosen poorly. That dose not mean that linear regression is a good fit.

When it comes to something like economics, I wonder how reliable any regression model will be, as history always changes (new technology is invented, industrial methods are adapted and so on). It might be more interesting to use segmented regressions (or quantile regression), especially if there is some third thing in the background (like an economical crisis) that you think is destroying the otherwise nicely working model suggestion.

But, I'm no economist so don't take my word for it.
 
Last edited:
May 2010
2
0
thanks for your replay people..
u were right! my boss also did mention about things like FOREX (foreign exchange) and some other economic stuffs that affected this raw sugar price model.. i am doing my practical in this trading company.. so she gave me a project to do a price modelling and she expected to get a best model from me in order to help her predicting future prices..
i have about a week left before my practical ends..
btw, i've done some correlations between all the variables..
here i show to u guys..

2007
Raw White Premium Crude Ethanol Corn
Raw 1.0000000 0.8461601 -0.3196739 -0.22178618 0.5097894 0.48126959
White 0.8461601 1.0000000 0.2321169 -0.49986194 0.7470529 0.37547130
Premium -0.3196739 0.2321169 1.0000000 -0.48590312 0.3984615 -0.21089315
Crude -0.2217862 -0.4998619 -0.4859031 1.00000000 -0.4600681 0.03608671
Ethanol 0.5097894 0.7470529 0.3984615 -0.46006812 1.0000000 0.46975670
Corn 0.4812696 0.3754713 -0.2108931 0.03608671 0.4697567 1.00000000

2008

Raw White Premium Crude Ethanol Corn
Raw 1.0000000 0.9420799 0.15083184 0.7762984 0.2357560 0.76728658
White 0.9420799 1.0000000 0.44970297 0.7412372 0.2413440 0.70244140
Premium 0.1508318 0.4497030 1.00000000 0.1590921 0.1113462 0.06374843
Crude 0.7762984 0.7412372 0.15909206 1.0000000 0.4625081 0.94258203
Ethanol 0.2357560 0.2413440 0.11134621 0.4625081 1.0000000 0.48498266
Corn 0.7672866 0.7024414 0.06374843 0.9425820 0.4849827 1.00000000


2009

Raw White Premium Crude Ethanol Corn
Raw 1.0000000 0.9675492 0.66665892 0.82591253 0.4187288 -0.33589663
White 0.9675492 1.0000000 0.73421697 0.81740221 0.4561246 -0.30412616
Premium 0.6666589 0.7342170 1.00000000 0.65892132 0.6390076 -0.03200543
Crude 0.8259125 0.8174022 0.65892132 1.00000000 0.5864452 -0.07910499
Ethanol 0.4187288 0.4561246 0.63900763 0.58644523 1.0000000 0.55699075
Corn -0.3358966 -0.3041262 -0.03200543 -0.07910499 0.5569907 1.00000000

2010

Raw White Premium Crude Ethanol Corn
Raw 1.00000000 0.97489480 0.3324361 -0.06971093 0.76364697 0.34613958
White 0.97489480 1.00000000 0.4992044 -0.06856777 0.76219179 0.34579977
Premium 0.33243608 0.49920445 1.0000000 -0.14189749 0.30091752 0.13223748
Crude -0.06971093 -0.06856777 -0.1418975 1.00000000 -0.08957436 -0.04879594
Ethanol 0.76364697 0.76219179 0.3009175 -0.08957436 1.00000000 0.68587790
Corn 0.34613958 0.34579977 0.1322375 -0.04879594 0.68587790 1.00000000

2007-2010
Raw White Premium Crude Ethanol Corn
Raw 1.000000000 0.98917194 0.6393104 -0.009198419 -0.1890490 -0.05301321
White 0.989171937 1.00000000 0.7071875 -0.022153949 -0.1951784 -0.07912836
Premium 0.639310424 0.70718750 1.0000000 -0.108612838 -0.1783351 -0.23443143
Crude -0.009198419 -0.02215395 -0.1086128 1.000000000 0.6097897 0.81862554
Ethanol -0.189048969 -0.19517845 -0.1783351 0.609789676 1.0000000 0.71359339
Corn -0.053013213 -0.07912836 -0.2344314 0.818625540 0.7135934 1.00000000

as u all can see, the correlation after i merged all the data, crude oil seems not affect the raw sugar price (raw sugar as dependent variable)..
but when it comes to correlation between each particular year, the correlation seems quite high!
so what should i do to give her the best price model?? (Worried)