Where I work we have a database job that does some accounting for us. Due to the way that this program is written we think that as we get further into the year the job will slow down and then revert to normal speeds at the start of the new year.
We have about 200 observations. I started looking at the data and the first thing that jumped out at me was that the more work there was to do the faster it went. So I made a graph that showed the number of records that needed to be processed during a single execution vs. the rate at which records were processed. There is a definite linear correlation between these two variables. I can easily use a least squares method to find the line that fits this data the best.
My problem is that I need to compare the program executions on the basis of when during the year they were run not the amount of work to be processed at once.
What I would like to do is compute the least squares line and use that to normalize the observations then make a new graph that plots the number of days into the year versus the normalized speed. This should show any potential slowdowns.
My question is: Is this a valid thing to do? Can I produce an "apples to apples" comparison by factoring out the batch size on the observations.
A co-worker says I can't do this because if there is an effect of the calendar date then it is already distorting my first graph. I think that since the batch size and calendar date are independent variables I can do this and get a meaningful result.
Is this clear? Can anyone tell me what this kind of analysis is called?