Modelling Application Statistics

Posted by mop Fri, 23 Jul 2004 19:28:00 GMT

I recall one of my stats profs saying once: ’not all random phenomena can be modelled with a normal distribution, but if your measurements are not normal then you are measuring the wrong thing’. He said it in jest, me thinks, but he had a point.

What happened was, I have been measuring response times for a busy web application. Forgetting most of what I learned in school, I’ve been recording the mean average and peak response times, i.e. average time and peak time it takes to service HTTP requests. (I use a fixed size sample, not samples from a fixed period, see Calculating a moving average.)

Recognizing that peak and average values are not entirely representative of server behaviour, I started by examining the distribution of response times for an arbitrary time period (1000 requests, about 10 minutes worth). It’s obviously not a traditional normal distribution, eh? If it were a normal distribution, we could estimate the mean average and variation at be confident that we’ve accurately characterized the distribution of response times. But it’s not normal so the fun begins.

Like any good engineer, I know just enough to bluff my way through a cocktail party, and enough to know where to look. The NIST has a great online resource: NIST/SEMATECH e-Handbook of Statistical Methods. Skip to the Tools and Aids section and you’ll find a gallery of probability distributions. If I squint a little, any one of Weibull, Lognormal, Power Lognormal could be an appropriate match for our distribution. The Power Lognormal curve is probably the closest, so on to the next step.

Remember, we’re trying to model our response times in a way that will allow us to characterize the distribution at any time with simple metrics like mean average and variance. The characteristic formula for Power Lognormal distributions is

which is expressed in terms of the normal distribution. So, describing our distribution would involve mean, variance plus the power and shape parameters. Four metrics, hmmm.

Here’s where I realize that I’m probably measuring the wrong thing.

Posted in  | no comments | no trackbacks

Comments

Trackbacks

Use the following link to trackback from your own site:
/articles/trackback/126

Comments are disabled