Using Radial Plots to Visualize Time Series Data

Whenever one talks about performance graphs and time series data visualization people tend to think about a X-Y graph. For system administrators the image that comes to mind most of the time is a RRD/Cacti/MRTG graph like the one shown above:

While this format has become common place, it may not fit into every situation.

One format that I really like to use when I need to display time series data is radial plots, particularly when you want to check for certain patterns or when you need some kind of scale transformation to be able to compare series of a different magnitude (eg. CPU usage versus Hits/s).

Besides that, the resemblance to a wall clock provides an intuitive idea of continuity.

In order to create such plot I’ll use a slightly modified version of the script that will output:

Hour(24 hour format);mean service time;mean hits/hour

The script can be downloaded here.

After processing a full month of Apache log files this is the output:

11;1676,928965;116,250926 21;1639,624705;113,361250 5;1241,370684;17,595185 17;1703,250085;150,101250 4;1227,487078;17,148241 2;1417,202514;30,994352 22;1651,318738;104,037222 18;1675,477736;122,894306 3;1309,117393;22,372500 8;1640,770019;91,538333 16;1722,880051;170,834028 6;1360,592095;23,434630 23;1626,394824;82,701528 13;1662,728863;156,627130 1;1503,136839;41,077778 12;1667,192097;114,413519 14;1677,577372;153,831944 20;1626,475911;113,596111 15;1678,035555;147,567870 7;1547,754519;47,789167 0;1567,574930;59,452037 19;1648,602616;120,635694 10;1696,497813;128,270556 9;1683,481828;120,212315

Keep in mind that before using  the script you need to modify your Apache log files as explained in this previous post.

By default R can’t do radial plots but you can use plotrix.I’ll keep things simple and use Excel instead of R in this example.

I’ll create two plots using the same data (the service time), one using a traditional X-Y graph and other using a radial graph:

I’ll leave the analisys and interpretation of the data to the reader, although you can see a very interesting pattern as the service time slightly decreases over the night.

The radial chart for the hits/s is as follows:

Now in order to see if there’s a correlation between the service time and the number of hits per second, I’ll perform a normalization between thse two values and plot another radial chart using the transformed data.

The method to normalize the data is pretty straightforward, all one has to do is choose a reasonable constant number (I’ll use 100), multiply the value for this constant and divide by the maximum value of this column, here’s a screen shot of my Excel table that should help:

The resulting chart, showing both the service time and the hits per second:


Radial plots can be useful to display time series data, particularly when the behavior of your system tends to be cyclic. The resemblance to a wall clock makes it easy for one visualize the activity during different times of the day.

Planned Capacity

As presented at CMG Brazil 2010 national conference.

Basic Library Booklist

Frequently people ask me about what books they should read in order to get a grip on capacity planning.

While this list is far from complete I believe that anyone that reads and understand the concepts presented in these books will be able to start doing capacity planning.

If you want a truly hands-on approach and most of your environment is running on Linux, I deeply recommend “The Art of Capacity Planning” by John Allspaw.

It’s a pretty well written and easy to read book that will make you jumpstart into the capacity planning world.

My currently preferred book on this subject is “Guerrilla Capacity Planning” by Neil J. Gunther. This is a must read for anyone that is serious about modern Capacity Planning.

Another reference book on this subject is “Capacity Planning for Web Performance“, by Daniel A. Menasc√© and Virgilio A. F. Almeida.

Last but not least, I recommend another book by Neil J. Gunther, “Analyzing Computer System Performance with Perl::PDQ“. There’s a whole chapter about the Linux Load Average and processing queues that changed the way I dealt with Linux performance data.