• Mar 29th 2007, 07:43 PM
stupidpoeticjustice
Hello,

When evaluating a set of items for their popularity, I pull popularity

metrics from two different sources, aggregating each score into an average.

While I am satisfied with this metric (Let's call it the 'popularity

score'), I'd like to create a simplified, more meaningful score.

My initial though was to create an index using the following formula:

(D3/MAX(\$D:\$D))*100)

Where D3 is the individual popularity score and D:D represents all

popularity scores.

I discovered, however, that outliers would have a disruptive effect on the

data. For instance, if there where several VERY popular items, other items

with acceptable popularity scores would appear very low in relative terms.

Any ideas on how to account for the Outliers while still retaining the

meaning in an easy to understand way?
• Mar 29th 2007, 08:01 PM
AfterShock
Quote:

Originally Posted by stupidpoeticjustice
Hello,

When evaluating a set of items for their popularity, I pull popularity

metrics from two different sources, aggregating each score into an average.

While I am satisfied with this metric (Let's call it the 'popularity

score'), I'd like to create a simplified, more meaningful score.

My initial though was to create an index using the following formula:

(D3/MAX(\$D:\$D))*100)

Where D3 is the individual popularity score and D:D represents all

popularity scores.

I discovered, however, that outliers would have a disruptive effect on the

data. For instance, if there where several VERY popular items, other items

with acceptable popularity scores would appear very low in relative terms.

Any ideas on how to account for the Outliers while still retaining the

meaning in an easy to understand way?

Why not take the lognormal distribution and then take the antilog- this will take care of outliers.
• Mar 30th 2007, 05:32 AM
stupidpoeticjustice
Thanks for the reply. First off, I believe that this would imply that there was a normal distribution. I believe that this may not be the case.

Regarding implementation, I'm having difficulty getting the formula. Here is my data (D3:D29):

0.00
0.00
0.00
42.56
0.00
680.99
0.00
0.00
60.80
0.00
18.24
0.00
85.12
170.24
0.00
0.00
297.92
103.36
0.00
0.00
0.00
30.40
18.24
30.40
72.96
0.00