# Thread: Statistics problem, can any one help?

1. ## Statistics problem, can any one help?

Use the sales data given below to determine

(a) the least squares trend line.
(b) the predicted value for 2002 sales.

Year... Sales(units).......Year........Sales (units)
1995 .......130............ .....1999..............169
1996........140..................2000............. 182
1997........152..................2001............. 194
1998....... 160................. 2002..................?

2. This is what I would do, and I believe it is the correct way to approach this problem.

First, let's rename the years such that 1995 = 1, 1996 = 2, etc.

Thus we have the following seven data points: (1, 130), (2, 140), (3, 152), (4, 160), (5, 169), (6, 182), (7, 194).

We wish to find a line $y = \beta_1 + \beta_2x$ that "best" fits the abovementioned seven data points. We will define "best" in a moment. For now, just note that we want to solve for $\beta_1$ and $\beta_2$ in the following linear system:

$\beta_1 + 1\beta_2 = 130$
$\beta_1 + 2\beta_2 = 140$
$\beta_1 + 3\beta_2 = 152$
$\beta_1 + 4\beta_2 = 160$
$\beta_1 + 5\beta_2 = 169$
$\beta_1 + 6\beta_2 = 182$
$\beta_1 + 7\beta_2 = 194$.

Notice what we did here: for each point $(x_i, y_i)$ we included the equation $\beta_1 + x_i\beta_2 = y_i$ in our linear system.

Now, back to what we meant by "best."
"Best" in our case, means minimizing the sum of squares of "errors," i.e., the least squares approach. That is, we wish to find a line $y = \beta_1 + \beta_2x$ that fits our seven data points, such that the sum of squares of "errors" is minimized. We define "error" as follows: For each $\beta_1 + x_i\beta_2 = y_i$ in our linear system, the error is given by $\epsilon_i = y_i - (\beta_1 + x_i\beta_2)$. Note that each individual error can be positive or negative (hence why they end up squared).

So, we want find $\beta_1$ and $\beta_2$ that solve the equations in our linear system, such that $\sum_{i=1}^7 \epsilon_i^2$ is as small as possible, i.e., minimized. This is analogous to the following procedure:

First, we need to set up the sum of squares of errors function, which easily enough is

$S(\beta_1, \beta_2) = \sum_i=1^7 y_i - (\beta_1 + x_i\beta_2) =$ $
(130-(\beta_1 + 1\beta_2))^2 + (140-(\beta_1 + 2\beta_2))^2 + (152-(\beta_1 + 3\beta_2))^2 + (160-(\beta_1 + 4\beta_2))^2 +$
$
(169-(\beta_1 + 5\beta_2))^2 + (182-(\beta_1 + 6\beta_2))^2 +
(194-(\beta_1 + 7\beta_2))^2$
.

Next, we want to compute the minimum of $s(\beta_1,\beta_2)$, which is done by calculating the partial derivatives of $s(\beta_1,\beta_2)$ with respect to $\beta_1$ and $\beta_2)$ and setting them equal to zero. Finally, we solve for $\beta_1$ and $\beta_2)$. Then plug those values you get back into the original equation y = b1 + b2 x

3. Originally Posted by stoorrey
Use the sales data given below to determine

(a) the least squares trend line.
(b) the predicted value for 2002 sales.

Year... Sales(units).......Year........Sales (units)
1995 .......130............ .....1999..............169
1996........140..................2000............. 182
1997........152..................2001............. 194
1998....... 160................. 2002..................?

Make table:
For the time series, set years as follows:
1995 as X=0
1996 as X=1
.
.
.
2002 as X=7

Use Y for sales units.

$\begin{array}{cccc}X&X^2&Y&XY\\
0&0&130&0\\
1&140&140&140\\
2&152&4&304\\
.&.&.&.\\
.&.&.&.\\
\Sigma X&\Sigma X^2&\Sigma Y&\Sigma XY\end{array}$

Setup the following equations:

$\Sigma Y=a_0N+a_1\Sigma X$---------------------eq(1)
$\Sigma XY=a_0\Sigma X+a_1\Sigma X^2$----------eq(2)

Where $N=8$

Solve for $a_0$ and $a_1$ in eq(1) and eq(2).

Once you got $a_0$ and $a_1$, you can find the predicted values, $Y$ simply by inputing X's into equation 3:
$
Y=a_0+a_1X$
---------eq(3)

Equation (3) is the trend line.

Since I am not a student in a school of business, I don't know the abbreviation MAD or MSE, but you should not have any trouble doing the rest.

4. 1 It would be good to know what your model is
2 I don't know mad either, but I can only guess at it with a bad joke
3 MSE is Mean Squared Error, it's the SSE divided by it's degrees of freedom.
These are all chi-square rvs, if you have normal errors.
(SSE is the sum of squares due to error. It's what your model cannot explain.)

5. Originally Posted by matheagle
1 It would be good to know what your model is

$\Sigma Y=a_0N+a_1\Sigma X$---------------------eq(1)
$\Sigma XY=a_0\Sigma X+a_1\Sigma X^2$----------eq(2)

$Y=a_0+a_1X$---------eq(3)

Equation 1 and 2 are called the Least-Squares Normal Equations.
Equation 3 is called the Least-Squares Linear Regression Equation.

This line has been removed. I realized it wasn't polite.

Originally Posted by matheagle
3 MSE is Mean Squared Error, it's the SSE divided by it's degrees of freedom.
Number 3 is not in my book. Which book do you recommend?