Not sure if this is the appropriate forum, let me know...
I'm considering changing a component in a model I built. The model is used to predict performance of baseball players.
Let's use batting average as an example...
The goal of the weighting is to give more credit to younger players who 'break out'. As in, if the aging curve predicts the average player will improve by .015 points of batting average between ages 18 and 19, and Player X improved by 40 points, I want this season weighted more aggressively in the model that predicts batting average. Basically, I'm using the degree of breakout to effect the degree of weighting; (Improvement)/(Expected Improvement)= weighting multiplier, with a maximum of 3x the previous season's weight.
The problem I run into, is that as the years go by, slightly older players are only expected to have tiny improvements. As in, between the ages of 25 and 26, normal aging predicts only a gain of .002 batting average (ie from .260 to .262). So, a huge percentage of these players will appear to have 'broken out' under the methodology I currently use, which is undesirable. Any idea how I could/should work around this?
Much appreciated, folks.
0.035 / 0.015 = 2.3333
0.006 / 0.002 = 3.0000
The problem seems to be the proximity of zero (0) to the expected value. This is a common problem. You must invent a new measure that get's away from zero. Ummm... maybe a sliding scale by age?
Base = max(0,Age - 20)
Base = max(0,19-20) = 0
(0 + 0.035) / (0 + 0.015) = 0.035 / 0.015 = 2.3333
Base = max(0,25-20) = 5
(5 + 0.006) / (5 + 0.002) = 5.006 / 5.002 = 1.0008
It's just a rough idea. Hopefully, it will get you started thinking in some useful direction. You may also want to consider logarithms in some way.
They sky's the limit. It's your system. Feel free to think of whatever you think it rational.
That's a great idea, thanks.
No, it's not. It might lead to one. Let's see what you come up with. A ratio might be better. I haven't thought much about the logartihms.