Graphing Running Averages in RRDTool

There are a number of times where the data collected may appear so erratic that it is difficult to identify any trends.  While use of a VDEF can provide a gross average of a data set, it doesn’t provide the utility of a true running average.  The running average can provide a consistent calculation regardless as to the time scale displayed in the graph.  Fortunately, RRDTool provides a flexible means via its TREND operator for calculating a running average to address this need.

To begin, setup a chart that graphs the desired data in a typical manner before adding in the running average.  The following graph command creates a somewhat typical visualization of the last day’s network traffic as illustrated by the following graph:

rrdtool graph example.png \
  --title "Network Traffic" \
  --width=500 \
  --slope-mode \
  --end=now \
  --start=end-24h \
  --base=1000 \
  DEF:bytesIn=network.rrd:bytesIn:AVERAGE \
  DEF:bytesOut=network.rrd:bytesOut:AVERAGE \
  CDEF:bpsIn=bytesIn,8,* \
  CDEF:bpsOut=bytesOut,8,* \
  CDEF:bpsOutNeg=bpsOut,-1,* \
  VDEF:bpsInTot=bpsIn,TOTAL \
  VDEF:bpsOutTot=bpsOut,TOTAL \
  COMMENT:"              current     min        max      total\n" \
  AREA:bpsIn#FF880044:"Bits/s In " \
  GPRINT:bpsIn:LAST:"%6.1lf%s\t" \
  GPRINT:bpsIn:MIN:"%6.1lf%s\t" \
  GPRINT:bpsIn:MAX:"%6.1lf%s\t" \
  GPRINT:bpsInTot:"%6.1lf%s\n" \
  LINE1:bpsIn#FF8800CC \
  AREA:bpsOutNeg#44C80044:"Bits/s Out" \
  GPRINT:bpsOut:LAST:"%6.1lf%s\t" \
  GPRINT:bpsOut:MIN:"%6.1lf%s\t" \
  GPRINT:bpsOut:MAX:"%6.1lf%s\t" \
  GPRINT:bpsOutTot:"%6.1lf%s\n" \
  LINE1:bpsOutNeg#44C800CC \
  HRULE:0#000000

With the basic graph setup, adding the elements necessary for the running average can now begin.  In this example, it appears there is a roughly hourly cycle (Time Machine backups) that defines the period so a reasonable start is to work with a minimum average calculated over 60 minutes.

First, additional CDEF statements need to be declared applying the TREND operator to the desired data.  The format for this operation is straightforward:

CDEF:label=data_source,time_span,TREND

In this case, two new data sources are defined and are given the labels bspInTrend and bpsOutTrendNeg.  The data sources for these are the previously declared bpsIn and bpsOutNeg and the time span for them is 1 hour (3600 seconds).

  CDEF:bpsInTrend=bpsIn,3600,TREND \
  CDEF:bpsOutTrendNeg=bpsOutNeg,3600,TREND \

With the running averages now defined, it is possible to graph the data.  For the purposes of this example, the running average is graphed as a “shadow” on top of the base data.

  LINE2:bpsInTrend#000000CC \
  LINE2:bpsOutTrendNeg#000000CC \

The graph shown below illustrates the new running averages.

The running averages are now shown and it illustrates that despite the hourly peaks, the overall trend is fairly flat with the slight exception around 00:15.  However, the running averages also show a gap at the beginning of the graph. This gap is because it takes an hour of data before the hourly running average can be computed.  While this may be acceptable, it doesn’t look polished.

A minor tweak of the original DEF statements can provide “padding” necessary to eliminate the displayed gap in the running average lines.  This can be accomplished by extending the start time of the data sources by at least the same duration specified in the running averages.  Note that the display area is not affected by altering the start time of the DEF statements — that is controlled by the --start and --end options.  For the example data, the DEF statements need to extend the range of the data set by 1 hour so as to encompass 25 hours instead of the display area’s 24 hours:

  DEF:bytesIn=network.rrd:bytesIn:AVERAGE:start=end-25h \
  DEF:bytesOut=network.rrd:bytesOut:AVERAGE:start=end-25h \

Putting it together, the following command will pull in all the data necessary to calculate a running average for the extent of the display area, calculate the running averages, and display the results:

rrdtool graph example.png \
  --title "Network Traffic" \
  --width=500 \
  --slope-mode \
  --end=now \
  --start=end-24h \
  --base=1000 \
  DEF:bytesIn=network.rrd:bytesIn:AVERAGE:start=end-25h \
  DEF:bytesOut=network.rrd:bytesOut:AVERAGE:start=end-25h \
  CDEF:bpsIn=bytesIn,8,* \
  CDEF:bpsOut=bytesOut,8,* \
  CDEF:bpsOutNeg=bpsOut,-1,* \
  CDEF:bpsInTrend=bpsIn,3600,TREND \
  CDEF:bpsOutTrendNeg=bpsOutNeg,3600,TREND \
  VDEF:bpsInTot=bpsIn,TOTAL \
  VDEF:bpsOutTot=bpsOut,TOTAL \
  COMMENT:"              current     min        max      total\n" \
  AREA:bpsIn#FF880044:"Bits/s In " \
  GPRINT:bpsIn:LAST:"%6.1lf%s\t" \
  GPRINT:bpsIn:MIN:"%6.1lf%s\t" \
  GPRINT:bpsIn:MAX:"%6.1lf%s\t" \
  GPRINT:bpsInTot:"%6.1lf%s\n" \
  LINE1:bpsIn#FF8800CC \
  AREA:bpsOutNeg#44C80044:"Bits/s Out" \
  GPRINT:bpsOut:LAST:"%6.1lf%s\t" \
  GPRINT:bpsOut:MIN:"%6.1lf%s\t" \
  GPRINT:bpsOut:MAX:"%6.1lf%s\t" \
  GPRINT:bpsOutTot:"%6.1lf%s\n" \
  LINE1:bpsOutNeg#44C800CC \
  LINE2:bpsInTrend#000000CC \
  LINE2:bpsOutTrendNeg#000000CC \
  HRULE:0#000000

 

Graphing Time Shifted Data in RRDTool

While visualizing performance or activity data using a well-designed chart can be useful, it is frequently desired to be able to compare recent performance against an earlier period of time.  Comparisons against historic data help identify changes in trends or other anomalous performance behaviors.  In some cases, this can be done simply by extending the time span of the graph.  However, expansion of the graphed time span can lead to either an unacceptable loss of resolution or an unacceptably large image.  Fortunately, RRDTool can make this type of comparison chart fairly easily without resulting in a loss of fidelity or unwieldy image size.

To begin, it helps to setup a chart that graphs the desired data in a typical manner before adding in the time shifted overlays.  The following graph command creates a somewhat typical visualization of the last hour’s network traffic as illustrated by Example 1 graph:

rrdtool graph example.png \
  --title "Network Traffic" \
  --width=500 \
  --slope-mode \
  --end=now \
  --start=end-1h \
  --base=1000 \
  DEF:bytesIn=network.rrd:bytesIn:AVERAGE \
  DEF:bytesOut=network.rrd:bytesOut:AVERAGE \
  CDEF:bpsIn=bytesIn,8,* \
  CDEF:bpsOut=bytesOut,8,* \
  CDEF:bpsOutNeg=bpsOut,-1,* \
  VDEF:bpsInTot=bpsIn,TOTAL \
  VDEF:bpsOutTot=bpsOut,TOTAL \
  COMMENT:"            current     min        max      total\n" \
  AREA:bpsIn#FF880044:"Bits/s In " \
  GPRINT:bpsIn:LAST:"%6.1lf%s\t" \
  GPRINT:bpsIn:MIN:"%6.1lf%s\t" \
  GPRINT:bpsIn:MAX:"%6.1lf%s\t" \
  GPRINT:bpsInTot:"%6.1lf%s\n" \
  LINE1:bpsIn#FF8800CC \
  AREA:bpsOutNeg#44C80044:"Bits/s Out" \
  GPRINT:bpsOut:LAST:"%6.1lf%s\t" \
  GPRINT:bpsOut:MIN:"%6.1lf%s\t" \
  GPRINT:bpsOut:MAX:"%6.1lf%s\t" \
  GPRINT:bpsOutTot:"%6.1lf%s\n" \
  LINE1:bpsOutNeg#44C800CC \
  HRULE:0#000000

With the basic chart setup as desired, now the command can be modified to support a time shifted overlay.

First, additional DEF statements need to be defined for time range for the source data to encompass the time span desired for comparison.  Note that this is not the same as the graph’s display range; those options (--start and --end) should remain unchanged.  The override start time should extend the time frame to the start of the time shifted data.  For example, the following would be suitable for comparison of the previous 1 hour:

  DEF:bytesInPrevHour=network.rrd:bytesIn:AVERAGE:start=end-2h \
  DEF:bytesOutPrevHour=network.rrd:bytesOut:AVERAGE:start=end-2h \

As another example, the following would be suitable for a comparison of the same hour, 1 day (24 hours) ago:

  DEF:bytesInPrevDay=network.rrd:bytesIn:AVERAGE:start=end-25h \
  DEF:bytesOutPrevDay=network.rrd:bytesOut:AVERAGE:start=end-25h \

Next, a new SHIFT statement needs to be introduced that shifts the data being displayed by the specified number of seconds.  The SHIFT statements must be specified after the DEF statements they are shifting.  The following statements would shift the newly inserted DEF statements by 1 hour (3600 seconds):

  SHIFT:bytesInPrevHour:3600 \
  SHIFT:bytesOutPrevHour:3600 \

Next, the CDEF and VDEF operations need to be supplied in order to transform the new data in a consistent manner with the original data.

  CDEF:bpsInPrev=bytesInPrevHour,8,* \
  CDEF:bpsOutPrev=bytesOutInPrevHour,8,* \
  CDEF:bpsOutPrevNeg=bpsOutPrev,-1,* \

Finally, the new elements are ready for graphing.  Care should be made to ensure the shifted data is displayed in a distinct manner from the current data so as to prevent visual confusion or obscuring the current data.

  LINE1:bpsInPrev#00000088 \ 
  LINE1:bpsOutPrevNeg#00000088 \

Pulling it all together, the following command will pull in the data from the extended time frame, shift it forward by the appropriate amount of time so that it aligns with the displayed time frame, perform the desired transformation options, and finally display the results:

rrdtool graph example.png \
  --title "Network Traffic" \
  --width=500 \
  --slope-mode \
  --end=now \
  --start=end-1h \
  --base=1000 \
  DEF:bytesIn=network.rrd:bytesIn:AVERAGE \
  DEF:bytesOut=network.rrd:bytesOut:AVERAGE \
  DEF:bytesInPrevHour=network.rrd:bytesIn:AVERAGE:start=end-2h \
  DEF:bytesOutPrevHour=network.rrd:bytesOut:AVERAGE:start=end-2h \
  SHIFT:bytesInPrevHour:3600 \
  SHIFT:bytesOutPrevHour:3600 \
  CDEF:bpsIn=bytesIn,8,* \
  CDEF:bpsOut=bytesOut,8,* \
  CDEF:bpsOutNeg=bpsOut,-1,* \
  CDEF:bpsInPrev=bytesInPrevHour,8,* \
  CDEF:bpsOutPrev=bytesOutInPrevHour,8,* \
  CDEF:bpsOutPrevNeg=bpsOutPrev,-1,* \
  VDEF:bpsInTot=bpsIn,TOTAL \
  VDEF:bpsOutTot=bpsOut,TOTAL \
  COMMENT:"            current     min        max      total\n" \
  AREA:bpsIn#FF880044:"Bits/s In " \
  GPRINT:bpsIn:LAST:"%6.1lf%s\t" \
  GPRINT:bpsIn:MIN:"%6.1lf%s\t" \
  GPRINT:bpsIn:MAX:"%6.1lf%s\t" \
  GPRINT:bpsInTot:"%6.1lf%s\n" \
  LINE1:bpsIn#FF8800CC \
  AREA:bpsOutNeg#44C80044:"Bits/s Out" \
  GPRINT:bpsOut:LAST:"%6.1lf%s\t" \
  GPRINT:bpsOut:MIN:"%6.1lf%s\t" \
  GPRINT:bpsOut:MAX:"%6.1lf%s\t" \
  GPRINT:bpsOutTot:"%6.1lf%s\n" \
  LINE1:bpsOutNeg#44C800CC \
  LINE1:bpsInPrev#00000088 \ 
  LINE1:bpsOutPrevNeg#00000088 \
  HRULE:0#000000

 

Enabling DNS statistics for Mac OS X Server

The DNS server is an essential service for the smooth operations of any site, but there is little insight into how it is being used or its operational health presented in the Apple Server Admin tool.  Thankfully, Apple uses the standard bind9 utility as the underlying implementation for the DNS service and this can provide a great deal of statistics for analysis.  Unfortunately, the statistics are currently disabled in the default installation and there is no means to enable them using the Server Admin tool but it is relatively straight-forward to edit a couple configuration files in order to unleash this capability.

First, the /etc/named.conf file needs to be modified to specify both the statistics file and the appropriate control mechanisms.  As this is an essential configuration file, create a backup copy of the existing (working) file before editing in case something goes wrong.

In the named.conf file, ensure that there is an configuration block for the “controls”.  By default, it should already be present and look similar to the following:

controls  {
        inet 127.0.0.1 port 54 allow    {any;   }
        keys    { "rndc-key";    };
   };

This section defines the access control for the stats tool (among other things).  The essential elements for this purpose is that it defines the communications port as port 54 and it uses the key identified as “rndc-key” as its security token.  The key is actually defined in a separate file and read into the configuration via the include "/etc/rndc.key" statement earlier in the named.conf file.

There should also be a configuration block for the “options”.  A typical installation will look like the following:

options  {
        include "/etc/dns/options.conf.apple";
                /*
         * If there is a firewall between you and nameservers you want
         * to talk to, you might need to uncomment the query-source
         * directive below.  Previous versions of BIND always asked
         * questions using port 53, but BIND 8.1 uses an unprivileged
         * port by default.
         */
        // query-source address * port 53;
   };

This section requires some minor modification to specify the stats file.  In the block, insert a line specifying statistics-file "/full/path/to/the/named.stats";.  It is best to make the addition after the include statement so that the edited result looks similar to the following:

options  {
        include "/etc/dns/options.conf.apple";

        // Addition to enable statistics
        statistics-file "/var/log/named.stats";

                /*
         * If there is a firewall between you and nameservers you want
         * to talk to, you might need to uncomment the query-source
         * directive below.  Previous versions of BIND always asked
         * questions using port 53, but BIND 8.1 uses an unprivileged
         * port by default.
         */
        // query-source address * port 53;
   };

After the named.conf file has been configured properly, a new file must created for the configuration of the rndc utility.  Create the /etc/rndc.conf file and edit it with the following settings:

include "/etc/rndc.key";

options {
    default-key "rndc-key";
    default-server localhost;
    default-port 54;
};

The include statement must match the same included file specified in the named.conf file as the security key (“rndc-key”) must match or the rndc utility used to trigger the stats dump will not be authorized to connect.  The options block specifies the key name and the default server and port.  The assumption here is that the rndc tool will be run on the same host (localhost) as the DNS server.  The port should match the port specified in the controls block of the named.conf file.

At this point, the named daemon can be safely signaled to pick up the configuration changes.  This can be done easily via the Server Admin tool by restarting the DNS service or, for the more advanced sysadmin, a SIGHUP can be sent to the process.

To trigger a stats dump, simply call the rndc stats and look for the stats file specified in the named.conf file.  If the file isn’t found, there may be a authorization issue:  try running the command as an administrator or call the command via sudo (e.g. sudo rndc stats).

The stats can be triggered on an ongoing basis by specifying an appropriate entry for the launchd system.  A simple plist formatted file located in /Library/LaunchDaemons can serve to repeatedly call the stats file for ongoing monitoring.  (See launchd tips and tricks for more information.)

The following example would be named com.hostname.namedstats.plist and specifies that the stats should be generated every 5 minutes (300 seconds).

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.hostname.namedstats</string>
    <key>StartInterval</key>
    <integer>300</integer>
    <key>ProgramArguments</key>
    <array>
        <string>/usr/sbin/rndc</string>
        <string>stats</string>
    </array>
</dict>
</plist>

Note that every time the stats are generated they are appended to the existing file which can quickly result in a very large, unwieldy file.  If recurring stats are desired, then ensuring the stats file is periodically rotated is essential.