# Calculating a Weighted Average and its Variance

• Aug 27th 2007, 04:15 AM
BlowFish
Calculating a Weighted Average and its Variance
Hi I am not sure this constitutes an advanced query or a basic one. It has me beaten! I am trying to find a formula and so construct a computer algorithm for calculating the weighted average and variance (taking into account the weights) for a series of data. The application is a series of stock data (weighted by volume) and for computational efficiency I'd like to use an algorithm that does not need to re-iterate the whole series when a new piece of data is added to the sample.

For a non weighted series its straight forward for each point (n) in the series the variance and mean are

for each piece of data in the series
n=n+1
Mean = Sum/n
Variance = (SumSqrs - Square(Sum)/n)/(n-1)
end

When new data arrives the loop can be run once more for the new data. For a weighted series I need to scale each point by

Weight(n)/Weight(total)

The problem I have is that each time a new piece of data is added the Total Weight increases. As far as I can see this will require each point in the series to be re-weighed using the new total weighting?

My maths isn't good enough to do it by raw brain power (I tried). I also used a trial and error approach ...adding weighting in different places...keeping a sum of the weights etc. But while close it dose not appear to be correct.

There are a number of clever algorithms Algorithms for calculating variance - Wikipedia, the free encyclopedia for calculating variance in no weighted series but I have not found anything for weighted?

Any suggestions would be welcome
Nick.
• Aug 27th 2007, 06:00 AM
TKHunny
One thing that may help you is a small generalization. The standard or familiar algorithm for average (and its related algorithm for variance) can be seen as a Weighted Average that just happens to have all weights equivalent and equal to 1/n.

It is a VERY interesting exercise. See if you can get it with that one hint.
• Aug 27th 2007, 09:08 PM
JakeD
Quote:

Originally Posted by TKHunny
One thing that may help you is a small generalization. The standard or familiar algorithm for average (and its related algorithm for variance) can be seen as a Weighted Average that just happens to have all weights equivalent and equal to 1/n.

It is a VERY interesting exercise. See if you can get it with that one hint.

Unfortunately this formula from the algorithms page does not work just by replacing the n and n+1 with weights. It's not clear to me how it is generalized. But see my next post for an algorithm that does not use this formula.

• Aug 27th 2007, 09:28 PM
JakeD
Quote:

Originally Posted by BlowFish
Hi I am not sure this constitutes an advanced query or a basic one. It has me beaten! I am trying to find a formula and so construct a computer algorithm for calculating the weighted average and variance (taking into account the weights) for a series of data. The application is a series of stock data (weighted by volume) and for computational efficiency I'd like to use an algorithm that does not need to re-iterate the whole series when a new piece of data is added to the sample.

For a non weighted series its straight forward for each point (n) in the series the variance and mean are

for each piece of data in the series
n=n+1
Mean = Sum/n
Variance = (SumSqrs - Square(Sum)/n)/(n-1)
end

When new data arrives the loop can be run once more for the new data. For a weighted series I need to scale each point by

Weight(n)/Weight(total)

The problem I have is that each time a new piece of data is added the Total Weight increases. As far as I can see this will require each point in the series to be re-weighed using the new total weighting?

My maths isn't good enough to do it by raw brain power (I tried). I also used a trial and error approach ...adding weighting in different places...keeping a sum of the weights etc. But while close it dose not appear to be correct.

There are a number of clever algorithms Algorithms for calculating variance - Wikipedia, the free encyclopedia for calculating variance in no weighted series but I have not found anything for weighted?

Any suggestions would be welcome
Nick.

Here is an algorithm adapted from Algorithm III from the Wikipedia page. You can use it to add data points one at a time. But it is not optimized and does not minimize rounding error like Algorithm III.

Code:

```mean_old = 0; mean_new = 0; sum = 0; wgt_old = 0; wgt_new = 0; foreach x in data and w in weights:  wgt_new = wgt_old + w;  mean_new = (wgt_old*mean_old + w*x)/wgt_new;  sum = sum + wgt_old*mean_old^2 + w*x^2 - wgt_new*mean_new^2;  wgt_old = wgt_new;  mean_old = mean_new; end for; variance = sum/wgt_new;```
• Aug 28th 2007, 01:37 AM
BlowFish
Thanks for the replies. Interesting way to think of it TKHunny, when I couldn't 'get it' from straight brain power (my maths was never particularly advanced and is very rusty). I did try to come up with a solution empirically (OK trial and error). I think even in light of your way to view it, the empirical approach would have got me there if I was going to get there.

JakeD thanks I'll give that a try. It looks pretty similar to one of the things I came up with (trial and error). I'll give it a try, see how it looks, and report back.

Cheers,
Nick.
• Aug 28th 2007, 05:09 AM
BlowFish
Well coded the new algorithm and it looks great on the variance (well actually the standard deviation but it looks great).

One last thing is confusing me. The mean looks identical to an older algorithm I was using and for the life of me I can't understand why. Essentially it looks the same as simply doing :-

n = 1 to sample size

totalweight = totalweight + w(n)
mean = mean + value(n) * w(n) / totalweight

I thought that when a new piece of data arrives all the preceding weights would need to be changed as the universe of totalweight has changed. This does not seem to be the case. The algorithm above uses the total weight up to the data point of calculation not the total weight of the universe of data.

Anyway I'm very please with how the results are looking thanks ever so much. Please forgive my 'algorithmic' approach rather than pure maths. For me it is easier to think in this sort of fashion.

Cheers,
Nick.
• Aug 28th 2007, 10:54 AM
JakeD
Quote:

Originally Posted by JakeD
Here is an algorithm adapted from Algorithm III from the Wikipedia page. You can use it to add data points one at a time. But it is not optimized and does not minimize rounding error like Algorithm III.

Code:

```mean_old = 0; mean_new = 0; sum = 0; wgt_old = 0; wgt_new = 0; foreach x in data and w in weights:  wgt_new = wgt_old + w;  mean_new = (wgt_old*mean_old + w*x)/wgt_new;  sum = sum + wgt_old*mean_old^2 + w*x^2 - wgt_new*mean_new^2;  wgt_old = wgt_new;  mean_old = mean_new; end for; variance = sum/wgt_new;```

Quote:

Originally Posted by BlowFish
Well coded the new algorithm and it looks great on the variance (well actually the standard deviation but it looks great).

One last thing is confusing me. The mean looks identical to an older algorithm I was using and for the life of me I can't understand why. Essentially it looks the same as simply doing :-

n = 1 to sample size

totalweight = totalweight + w(n)
mean = mean + value(n) * w(n) / totalweight

I thought that when a new piece of data arrives all the preceding weights would need to be changed as the universe of totalweight has changed. This does not seem to be the case. The algorithm above uses the total weight up to the data point of calculation not the total weight of the universe of data.

Anyway I'm very please with how the results are looking thanks ever so much. Please forgive my 'algorithmic' approach rather than pure maths. For me it is easier to think in this sort of fashion.

Cheers,
Nick.

The code I gave for the mean has several differences with your code: both old and new weights are used; the old mean is multiplied by the old weights; parentheses are used so that the entire sum is divided by the new weights.

Code:

```wgt_new = wgt_old + w; mean_new = (wgt_old*mean_old + w*x)/wgt_new;```
I wonder how your testing could have produced a good variance using an incorrect mean when the variance calculation uses the mean.

I glad this approach works for you.
• Aug 29th 2007, 03:27 AM
BlowFish
Actually its the means that look very similar. I couldn't get the variance to match using any of my methods. Yours looks spot on without normalising or any other pre-processing. I will perhaps mess around a little further but I now have what I need!

Thanks again,
cheers,
Nick.