Originally Posted by

**ebaines** No, it's not true. For case 2 the amount of discrepancy as a percentage of R is less than in case 1. Hence D is less.

Looking back at your post #1, were you were happy with using a normalizer of max(|P|,|R|), at least for the cases where both P>0 and R>0, or P<0 and R<0? Do you want to keep using those rules? If you were happy with the results those two cases, then you were happy generating scores of 1 any time when the lesser of P or R approaches 0. For example:

P = 100, R = 1000 --> score = 0.9

P = 1, R = 1000 --> score = 0.999

Even if R is small yields a high score:

P = 1, R = 10 --> score = 0.9

By this reasoning if P = 1 then score should equal 1, the highest possible score that you ever want assigned. And if P is negative then the score should be greater than 1, but you dont want any score greater than one. Consequently I suggest the following:

case a: If P>0 and R> 0 then score = (R-P)/R

case b: if P<0 and R<0 then score = (R-P)/|P|

case c: if P = 0 and R not equal 0 then score = 1

case d: if R = 0 and P not equal 0 then score = 1

case e: if P<0 and R>0 then score = 1

Think about it. You lose all sensitivity to changes in R and P if they are of different signs, but that's a consequence of being consistent with the criteria given for cases a and b and the rule that score must be less than or equal to 1.

By the way: on that scatter plot, how did you calculate the scores for the 2000 households? From their R and P data? What formulas did you use?