Hi all

I have a set of observed data with three columns, year, x and y. x is avaliable from 1980 to 2010. y is avaliable from 1980 to 1990. My aim is to predict y for 2010.

I have used two methods. The first method is to build simple linear regression using 1980 to 1990 x and y data, and then estimate y for 2010 using the 2010 x value. As x and y is highly correlated and residual plot looks ok, I think this method is ok.

The problem is on the 2nd method. Firstly, I find the changes of x and y from one year to next. Say d_{x,81} is the x value at year 1981 minus the x value at year 1980. Secondly, I use these differences/changes to fit a simple linear regression. I found that the relationship between d_x and d_y is still very strong for 1980 to 1990. The intercept is not statistically significant but the slope is strongly significant. Thirdly, to estimate y for 2010, I added the observed y value in 1990 and the estimated d_{y91}. Then I used the estimated y in 1991 and the estimated d_{y92} to create the estimated y for 1992. I continue this until 2010.

Results from method one and two are similar.

I would like to know

(1) is the second method correct to estimate y for 2010?

(2) how do I calculate the confident interval for the estimated y for 2010 for the second method? The estimated y at 2010 is a sum of observed y at 1990 and all estimated d_{y} from 1991 to 2010

Any advice?