1 Attachment(s)

What kind of model could I use to predict annual server loads?

Hi math help forums

I've been tasked with helping to figure out what kind of prediction algorithm I could make to figure out when I needed to buy new servers from a database of information on server load.

I drew this pictureAttachment 23890

And attached it. Basically the same pattern of server load occurs every year with some variations.

What do people use to predict stocks? What does UPS use to predict the number of people needed to deliver packages every Christmas?

Right now I'm reading up on moving averages, and another intern is working on learning machine language. Is there a better approach?

Would any regression model be good enough to fit our information to predict things?

The inputs would be things like when we added new servers, new clients, how many clients were added, and how many logins occurred on our servers on every date, and of course historical data on CPU load going back a few years.

Re: What kind of model could I use to predict annual server loads?

What you need is some sort of time series analysis that takes into account the dependence of each Y value on previous Y values.

A quicker and dirtier approach (because I know very little about time series analysis) might be simple polynomial regression. If its the same basic temporal patterm each year (ie peak in Feb and December, lows in May and August) try fitting a polynomial regression using the day of the year (1 to 365) as the predictor (ie Load~day + day^2 +day^3 etc) this would capture the general swings through the year. You could then add additional predictors (eg year, client numbers, login numbers etc). This ignores the temporal nature of the data, which generally means the model will look a lot more significant than it really is, but it could at least give you a feel for what might be more important predictors of server load.

Re: What kind of model could I use to predict annual server loads?

I have tried using polynomial regression models up to a power of 6 using the built in excel trendline modeler. The results were less than satisfactory (often the regressions were almost a straight line). After reading up on the subject I've realized that selectively sampling the data won't help with polynomial regressions. Polynomial regressions only seem to work for curves that are more or less simple (a single hill, not a rollercoaster of hills). I could cut each year up into sections where each one is like one period of a sinusoid, but the data is too unstable to do that yet.

Perhaps other things like artificial neural networks (god no I'm not gonna write one) would be good for modeling.

Thanks for the response.

Re: What kind of model could I use to predict annual server loads?

If the server load is cyclical I'd suggest an ARMA model.

yt=y(t-1)+e(t-1)+et is the basic form. An annual cycle would include the appropriate lag to get the previous years data.

Re: What kind of model could I use to predict annual server loads?

At first I was going to say that I'd investigated moving averages and that they didn't predict far enough into the future, but after a little more reading I'm not so sure if regressive MA's are the same. I think this works, and if it does then I love you! :)

I'm trying ANNs still, but I need more data. I need to model overall cpu load demand, not load on individual machines, and include predicted sales data for if the sales team decides to push out a new software and they expect to gain a lot of customers, or the news decides to say bad things about us and we need to decrease our estimates.

Re: What kind of model could I use to predict annual server loads?

I have tried out Exponential Smoothing (Holt Winters) using R. Here is a great tutorial clicky.

I've found that exponential smoothing provides a fairly low bound for the 80/95% confidence intervals for logins, but when I do the same thing for registrations, I get some ridiculous data. I think this is because there was a massive spike in the variability and number of logins, either due to a change in the measurement process, an error, or something else. This is ridiculous.

I've been doing exponential smoothing on daily data though. I will try to do monthly smoothing later today.

Trying to make arimas results in an R error. Something to the effect of "Arima can't use more than 350 seasonal points" because it, and several other modeling methods run out of memory.

I have tried using fourier transforms to model the data, and this has worked somewhat well, but at the same time I don't feel like the projections are reasonable. Number of logins barely seems to increase over the next year. Research tips - Forecasting with long seasonal periods

If anyone ever does something similar to what I'm doing (I know riot is having server issues as well, so I'm sure other companies do this stuff), then I recommend learning R over Matlab if you need to do automation. Documentation for R in the field of statistics dwarfs documentation for Octave, and it is impossible to legally completely automate matlab scripts over the internet.

Finally, to determine the correlation for different load variables, and load, I have decided to use neural networks. I believe that the correlations between load variables and load change with time, but are somewhat consistent. Also I believe there is a linear correlation (1 client = x% CPU load). Hopefully I learn how to use Neuroph or find a good neural network tutorial. If I don't get to this though, logins and registrations can be used to estimate overall CPU load.