Then use the max of either "real" I or "reported" I.
I=10,000 E=1000 P=9,000 R=20,000
I=10,000 E=9,000 P=1,000 R=2,000
I=5,000 E=10,000 P=-5,000 R=2,000
I=5,000 E=10,000 P=-5,000 R=-2,000
I=5,000 E=5,000 P=0 R=2,000
I=-5,000 E=10,000 P=-5,000 R=0 (very rare)
At this point the only thing I can offer is if |(R-P)/I| is greater than 1, report it as 1, and don't worry about its value relative to others that are greater than 1. Any score > 1 requires a significant amount of untruthfulness. Or alternatively - for the denominator use the maximum value of I, E and |R|. For the data you gave previously this would yield:
I=10,000 E=1000 P=9,000 R=20,000: score = 11000/20000 = 55%
I=10,000 E=9,000 P=1,000 R=2,000: score = 1000/10000 = 10%
I=5,000 E=10,000 P=-5,000 R=2,000; Score = 7000/10000 = 70%
I=5,000 E=10,000 P=-5,000 R=-2,000: score = 3000/10000 = 30%
I=5,000 E=5,000 P=0 R=2,000: score = 2000/5000 = 40%
I=-5,000 E=10,000 P=-5,000 R=0 (very rare): How can I be negative? If that is not a typo then score = 5000/10000 = 50%
I think you have an impossible task. Given the lack of data (no I_real nor E_real) you don't have enough data to consistently derive a score between 0 and 1.
You haven't answered one of my previous questions - why do you say that R>=P? Can't you have the case P = +1000 and R = -1000?
P=0, R =2000
P=-1000 R = -500
In last case Household propose that have -1000$ loss but out company find that he has lower loss (-500) but not positive net budget. So in all cases R>=P . As a result you think R-P/I is a better in all cases but the only problem of it is unlimited positive boundary. Is this true?
Thank you again.
OK, so if they find R<P they make R=P; got it.
Next question - in your view which score should be higher (closer to 1):
(a) P=1, R=2, or
(b) P=10000, R= 20000
or should they be the same, and why?
The other approach is that the importance weight of your two examples (a and b) is the same because (a) has 50% hiding in that specific values (1 and 2) and also (b) has 50% hiding in the specific values of P=10,000 and R=20,000 and based on the household size.
Eventually I must say that after designing the formula I will insert it in my system and train system with it. After that we can more conversations about the better criteria but know we can create and improve these formulas.
Have more focus on situation (b) and it's higher values for hiding because of higher amount of money that he is hiding.
If you're willing to live with my above example having the same score, then this could do it:
1. First determine a value of relative difference 'D' between R and P, using D=(R-P)/(|R|), except if R = 0 then set the denominator to 1 (I'm assuming that values for R are typically not single digit). This value for D is always positive or zero, and can range from 0 to infinity.
2. Apply the following transformation: score = D/(D+1). This "compresses" the D value to a range between 0 and 1.
3. You may want to consider a slight change - a "tuning" variable K so that score = KD/(KD+1). You might try K = 0.5, or 0.1 and see how it behaves.
Hope this helps. You need to try a bunch of values and see if it meets your objectives.
1. for 'D' We have values like 1000 and values like 0.2 but a lot of 'D' values are between 0 and for example 20. I think these high values effects on 'D' and finally on 'Score'. What is your opinion?
2. the lower bound of score function is 0 but the upper bound is lower than 1. Is this true? D/(D+1) isn't a mapping function between [0 1] because as you know in mapping function we find minimum and maximum of all 'D's and set them to 0 and 1. Can you describe the score function with more details?
3.In second phase what is purpose of using 'K'?
4. When I checked scatter chart of 'Score' Values (phase 2). Values near 0.5 have more density. What do you think about this? This is scatter chart :
Thanks again and again :-)