# Thread: Normalising results from Survey data (looks easy but isn't !)

1. ## Normalising results from Survey data (looks easy but isn't !)

I spent a long time in excel playing around with this but just can't get it.

Consider the following scenario:

1. A construction yard has a number of supervisors.
2. Each supervisor has a different number of workers
3. A supervisor may be responsible for more than one site

1. Supervisors are regularly inspected using a checklist
2. checklist results may be negative or positive depending on findings

The management want to output graphs to show which are their best/worst performing supervisors.

What is a fair way to work this out given that the number of workers is different and the number of sites can be different

e.g.
Code:
supervisor bob  :  10 workers  : 2 sites  : 9 points  : score=??how to calculate??

supervisor john  : 10 workers  : 1 site  : 9 points   : score=??how to calculate??
In the example above, "bob" should score higher because he has more worksites to look after.

kindly have a look at the attachment. i'm at a dead end.

2. Originally Posted by chumbawumba

...
1. Supervisors are regularly inspected using a checklist
2. checklist results may be negative or positive depending on findings
The management want to output graphs to show which are their best/worst performing supervisors.
....
Each supervisor should get a separate evalution for EACH construction sight. One yard may have 8 highly skilled professional workers with all the right equipment, whereas the other yard managed by the same person may have 20 unskilled, ignorant, ill equiped workers who care not about the results.
Often it is not the supervisor who does the actual hiring/firing; so each yard must be treated separately.

You will need to evaluate the cost of a yard to the productivity generated, and compare that the the checklist rating.

Without knowing the specific values of the checklist items that relate to cost/productivity it is difficult to
provide guidance.

3. Hi Aidan,

Thanks for looking at the problem and considering a fair way to score these results.

Is there anyway that the problem can be solved as i described? I need to meet a deadline on this one. Later the model can be improved by incorporating some other data like you describe.

4. Originally Posted by chumbawumba
Hi Aidan,

Thanks for looking at the problem and considering a fair way to score these results.

Is there anyway that the problem can be solved as i described? I need to meet a deadline on this one. Later the model can be improved by incorporating some other data like you describe.
With the limited infomation:

Here is what I would do:

Sum ALL the employees and get the percentage of employees handled by each super.
(10+10) = 20
Thus bob would get 0.5 for worker handling and john would get 0.5 for worker handling.

Sum ALL the work sites get the percentage of handled by each super.
(2+1) = 3
Thus bob would get 0.667 for number of sites and john would get 0.333 for number of sites handled.

Sum the percentages for & multiply by the point to get a modified point system:
bob: 0.5+0.67 = 1.166 : 1.166 * 9 = 10.5
john: 0.5 + 0.333 = 0.833 : 0.833 * 9 = 7.5

With more data you could "weigh" the value of the different aspects. That is, if a person handles 10 employees and another handles 100, then it seems more natural that it should not be a 10:1 evaluation ratio but perhaps square root

$\displaystyle \dfrac{ \sqrt{100}}{\sqrt{10}} = 3.162$

or

$\displaystyle \dfrac{ \log{100}}{\log{10}} = 2.000$

or something in between.

5. This is very good. Just one last issue...

how could be tweak this formaula so that when a supervisor receives a negative score, he isn't penalised by having extra worksites:

bob: 0.5+0.67 = 1.166 : 1.166 * -9 = -10.5
john: 0.5 + 0.333 = 0.833 : 0.833 * -9 = -7.5

in the case above, you could argue that john should score worse than bob, because he has fewer sites to manage

Originally Posted by aidan
With the limited infomation:

Here is what I would do:

Sum ALL the employees and get the percentage of employees handled by each super.
(10+10) = 20
Thus bob would get 0.5 for worker handling and john would get 0.5 for worker handling.

Sum ALL the work sites get the percentage of handled by each super.
(2+1) = 3
Thus bob would get 0.667 for number of sites and john would get 0.333 for number of sites handled.

Sum the percentages for & multiply by the point to get a modified point system:
bob: 0.5+0.67 = 1.166 : 1.166 * 9 = 10.5
john: 0.5 + 0.333 = 0.833 : 0.833 * 9 = 7.5

With more data you could "weigh" the value of the different aspects. That is, if a person handles 10 employees and another handles 100, then it seems more natural that it should not be a 10:1 evaluation ratio but perhaps square root

$\displaystyle \dfrac{ \sqrt{100}}{\sqrt{10}} = 3.162$

or

$\displaystyle \dfrac{ \log{100}}{\log{10}} = 2.000$

or something in between.

6. Originally Posted by chumbawumba
This is very good. Just one last issue...

how could be tweak this formaula so that when a supervisor receives a negative score, he isn't penalised by having extra worksites:

bob: 0.5+0.67 = 1.166 : 1.166 * -9 = -10.5
john: 0.5 + 0.333 = 0.833 : 0.833 * -9 = -7.5

in the case above, you could argue that john should score worse than bob, because he has fewer sites to manage
NOTE: There are some who deserve a much worse rating than actually given.

To dampen the negative impact:
Determine the range of evaluations.
bob = -9
john = -9
bill = 5
samson = 11

range -9 to 11 = 20
Translate that range to: 1 to 21
(avoid zero in the modified range & you may want to also avoid 1 as the bottom)
Make all of the evaluations and then translate back.

Another way:

Determine the average (or mean) evaluation and reset the raw evaluations:
bob = -9
john = -9
bill = 5
samson = 11

mean = -2

Revalued based on average:
bob = -7
john = -7
bill = 7
samson = 13

If this were real world data, you need to be extremely careful about how you manipulate the supplied raw information. You tend to skew the results with your bias which may have nothing to do with the performance evaluations.

However, if this is simulated data, I would generate "some small re-alignments/adjustments" to the source data to prove my thesis with HARD COLD STATISTICS.