We're currently working on a problem where we've rated some products using an algorithm. To verify if the algorithm is reliable enough in predicting the product scores, we've created a sample set which has been independently scored by few users and then scored by the algorithm.

The data is of the kind...
User A User B User C ... Algorithm Score
Product1 45 30 38 42
Product2 19 18 20 24

We need to use a technique to validate the accuracy of our algorithm as we go about tweaking it so that the algorithm can predict the score with minimum error.

Should we go about using mean and standard deviation for this ? Would creating bell curves help us judge the performance of the algorithm ?