Graphing Solutions

Creating PHP bindings for RRDTool

Posted in Graphing Solutions, Technical Tidbits on October 18th, 2009 by steven – Be the first to comment

Background

PHP is a very popular language for web development.  Providing bindings for directly calling rrdtool within PHP provides a simple and efficient means of providing a high-level scripting interface into the data management and display capabilities of rrdtool.  The stock installation of rrdtool does not provide any support for PHP so bindings must be created for it manually.

Setup

  1. Download the php_rrdtool archive from the contributions directory of the RRDTool site.
  2. Extract the archive contents into the /usr/include/php/ext directory.

    sudo tar xzvf php_rrdtool.tar.gz -C /usr/include/php/ext

  3. Change the owner and group to “root” and “wheel”.

    sudo chown -R root:wheel /usr/include/php/ext/rrdtool

  4. Navigate to the newly created rrdtool directory.

    cd /usr/include/php/ext/rrdtool

  5. Generate the configuration files for the PHP bindings.

    sudo phpize

  6. Execute the newly-generated configuration script for the PHP bindings.  Note that the --with-rrdtool argument takes as its value the “root” directory where the various rrdtool files are located.  If installed using the default values supplied by the MacPorts utility, the “root” directory would be /opt/local.

    sudo ./configure CFLAGS="-fnested-functions" --with-php-config=/usr/bin/php-config --with-rrdtool=/opt/local

  7. Make the php rrdtool extension.

    sudo make

  8. Install the php rrdtool extension.  Make a note of the directory into which the extension was deployed.

    sudo make install

  9. Make a backup copy of the /etc/php.ini file for safety.  If the file does not yet exist, then copy the /etc/php.ini.default instead.

    sudo cp php.ini php.ini.orig

    or

    sudo cp php.ini.default php.ini

  10. Edit the /etc/php.ini file and alter the value of the extension_dir variable to the deployment directory (noted in Step 8).

    extension_dir = "/usr/lib/php/extensions/no-debug-non-zts-20060613/"

  11. Edit the /etc/php.ini file and a new variable under the Dynamic Extensions section specifying the rrdtool extension.

    extension=rrdtool.so

  12. Restart the apache server to load the new configuration information.

    sudo apachectl restart

  13. Verify the rrdtool extension can be loaded from within PHP.

    php -m

Special Notice for Mac OS X Users

With the evolution from 32-bit to 64-bit architectures in progress, it may arise that the rrdtool module is accessible through some mechanisms and not through others.  For example, it has been observed on some hardware/operating systems that the module will load properly when referenced directly by the PHP interpreter (i.e. php -m), but is not accessible when referenced within a web page served by Apache (i.e. not listed as part of the phpinfo() output).  This situation is most likely due to Apache running in 64-bit mode while the PHP interpreter and associated extensions are running in 32-bit mode.  If  this situation arises, then the suggested solution is to force Apache to run in 32-bit mode.

RRDTool Example Data

Posted in Graphing Solutions on August 8th, 2009 by steven – Be the first to comment

Introduction

Many of the postings about using rrdtool reference this example setup.  This example is representative of the type of data frequently gathered as part of routine system monitoring.  It has been designed to facilitate the illustration of the techniques and problems referenced in the other postings and is not necessarily representative of actual system monitoring conditions.  While a more typical usage would probably separate out groups of the statistics into several files, this example collects them all in a single file for the sake of simplicity.

The rrdtool must be installed prior to any attempt to create or load data.  It is readily available as a package for installation for a number of Unix distributions or may be downloaded directly from the developer’s website and built locally.

Creating the RRD

The example RRD creates a file named sysinfo.rrd, a start date of Jan 1, 2009 00:00:00 PST (Unix epoch time 1230796800) and a Primary Data Point (PDP) step value of 300 seconds.  There are a number of different measurements that it retains: the system load, temperature, CPU fan speed, amount of used disk space, rate of change in the use of disk space, and the number of bytes written or read from the disk.  For these measurements, it maintains three Round Robin Archives (RRA):  a RRA with a resolution of 5 minutes spanning 31 days, a RRA with a resolution of 15 minutes for 90 days, and a RRA with a resolution of 1 hour for 365 days.

rrdtool create sysinfo.rrd –start 1230796800 –step 300 \
DS:load:GAUGE:600:0:U \
DS:temperature:GAUGE:600:0:500 \
DS:cpu_fan:GAUGE:600:0:U \
DS:disk_used:GAUGE:600:0:U \
DS:disk_change:DERIVE:600:U:U \
DS:bytes_written:COUNTER:600:0:U \
DS:bytes_read:COUNTER:600:0:U \
RRA:AVERAGE:0.5:1:8928 \
RRA:AVERAGE:0.5:3:8640 \
RRA:AVERAGE:0.5:12:8760

Adding Data

The general format for adding data to the RRD is through the update command.  The general format for adding a set of datapoints for the same time value is as follows:

rrdtool update filename timestamp:DS1[:DS2][:DS3]…

filename is the name of the RRD to be updated.  An update command can update only a single RRD file at a time.

timestamp is the time of the event.  This is typically specified in the standard Unix epoch format (number of seconds elapsed since Jan 1, 1970 UTC), however it may also be represented as “N” (for Now) which will insert the data using the current time value.  An alternate format may also be specified using the formats as accepted by the at command.  (See the manual page for the at command for more details on the time format rrdtool supports.)  If the at format is used, the “@” symbol should be used to separate the timestamp from the data values instead of the “:” character.

DS is the value of the data point(s) that are to be inserted into the RRD for the time specified.  The data points are processed in the same order as the Data Sources were specified when the RRD was created.  If there is no data for a particular DS, then a “U” (for Unknown) should be used instead.  If there is more than one DS to be updated in the RRD, the values should be separated by a “:” character.

An alternate update format is also available which may assign a unique time value for each data point:

rrdtool update filename timestamp:DS1 [timestamp:DS2] [timestamp:DS3]…

Data Generation

The perl script sysinfo-sample.pl is available which will create the RRD as well as populate it with a full year’s worth of data.  The data is randomly generated but should bear enough semblance to real data to help illustrate the concepts.  Also, because it is randomly generated it will result in different results each time it is run.

Please note that this script will automatically overwrite any previous instance of the sysinfo.rrd file in the working directory.

  1. Create a directory that will hold the example data and associated files.
    mkdir rrd_example
  2. Download the sysinfo-sample.pl file into the newly created directory.
  3. Verify the permissions on the perl script are set as executable.
    chmod 0755 sysinfo-sample.pl
  4. Execute the perl script and wait for the command prompt to reappear.  (This may take 5-15 minutes depending on the speed of your system.)
    ./sysinfo-sample.pl

Upon completion, a file named sysinfo.rrd should be created in the working directory.  It will be filled with one year’s worth of data starting on Jan 1, 2009.

Creating a Round Robin Database with RRDTool

Posted in Graphing Solutions, Technical Tidbits on August 4th, 2009 by steven – 1 Comment

Introduction

There are two essential components that should be defined when creating a Round-Robin Database (RRD) using rrdtool:  the Data Source (DS) and the Round-Robin Archive (RRA).  The Data Source definition dictates how the data is entered into the database and can provide rules for the interpretation, limits, and expected frequency of updates.  The Round-Robin Archive definition dictates the storage of the data including how the data is consolidated, the time resolution, and the span of time the archive covers.  A single RRD instance must contain at least one Data Source and Round-Robin Archive, but may contain more.

Data Source Definition

The general format for a DS specification is:

DS:Label:Type:Heartbeat:Min:Max

Label is the name of the Data Source.  It may be from 1-19 characters long and consists of characters in the set [a-zA-Z0-9_].  The label should be descriptive of the data being collected.  Example labels include “temperature”, “BytesIn”, “bytes_out”, etc.

Type is one of rrdtool’s defined data types:  GAUGE, COUNTER, DERIVE, ABSOLUTE, or COMPUTE.

GAUGE is for storing simple measurements that are rate independent.  Data types such as temperatures, prices, or system load are suitable for this data type.

COUNTER is for storing continuously incrementing counters.  Data values input into this type of Data Source should never decrease except due to an overflow condition (which is handled by rrdtool).  Data of this type is stored as a per-second rate.  Example measurements that are suitable for this data type are velocity and throughput.

DERIVE stores the derivative of the line between the current and previous data points.  This is useful for converting an input of simple measurements into a rate.  For example, with an input of temperature readings, it will record the rate of temperature change instead of simply the raw temperature readings.

ABSOLUTE is used for counters which are reset upon reading.  This behavior is often found in fast-moving counters that would otherwise overflow frequently.  It is otherwise similar to the COUNTER type.

COMPUTE provides a means of applying a formula to other Data Sources in the RRD.  As such, the DS specification for a COMPUTE type is differs than the general form:

DS:Label:COMPUTE:rpn-expression

The COMPUTE type can be thought of as a “virtual” Data Source that calculates its value based off of the values of other Data Sources.

Heartbeat defines the maximum number of seconds between two data updates before the value of the data is assumed to be unknown.  This determines how rrdtool will interpolate data between readings.

Min and Max establish the expected range values for the data.  Setting appropriate values provides additional safeguards against the accidental inclusion of invalid data.  If the expected range values are unknown a value of “U” can be set instead.

Round Robin Archive Definition

The general format for a RRA specification is:

RRA:Consolidation Function:XFF:Steps:Rows

Consolidation Function (CF) is one of rrdtool’s defined functions:  AVERAGE, MIN, MAX, and LAST.  The different functions provide flexibility when aggregating Primary Data Points (PDP) into a RRA.  Note that regardless of the CF chosen, aggregation will result in the loss of resolution for the data.  For many applications, this is perfectly acceptable as maintaining fine-grained resolution on older data is often not necessary.

AVERAGE stores the average of all the Primary Data Points within the designated time range.

MIN stores the smallest value of all the Primary Data Pointswithinthe designated time range.

MAX stores the largest value of all the Primary Data Pointswithinthe designated time range.

LAST stores only the last (most recent) value of all the Primary Data Points within the designated time range.

XFF is the “X Files Factor”.  This defines how many Primary Data Points may be unknown before the Consolidated Data Point is also defined as unknown.  It is expressed as the ratio of the number of unknown PDP’s over the total number PDP’s for the aggregation period.  As such, valid values are between 0 and 1.  As an example, a XFF of 0.5 dictates that a CDP will be stored as unknown if the number of unknown PDP’s within the time period exceeds 50% of the total number of PDP’s.

Steps defines how many Primary Data Points will be aggregated into a Consolidated Data Point.  The Consolidated Data Point is then recorded in the RRA.  Each RRA Step is a multiple of the Step value associated with the Primary Data Point.  For example, if the Primary Data Point Step has a resolution of 300 seconds and the RRA Step value is 12, then each Consolidated Data Point will be aggregated over 3600 seconds (12 x 300 seconds).

Rows defines how many Consolidated Data Point Steps are stored in the RRA.  As each Row represents the time span defined by the RRA Step, it is a simple calculation to determine the maximum time represented in the archive.  For example, if the Primary Data Point has a Step value of 300 seconds, and the RRA Step value is 12, and the Row value is 8760, then the RRA will be able to store 1 year’s worth of data.  (300s x 12 x 8760 == 1 year)

rrdtool create

Before attempting to create a RRD, it is best to have fully defined the relevant Data Sources and Round Robin Archives the RRD will be storing.  With those defined, there are only a couple other command line options to define in order to successfully create the RRD.

The general format for creating a RRD is as follows:

rrdtool create filename --start time --step secs \
DS:Label:Type:Heartbeat:Min:Max \
RRA:Consolidation Function:XFF:Steps:Rows

filename is any legitmate filename permitted by the filesystem.  It is suggested that a suffix of “.rrd” be used to identify the file as a Round Robin Database.

start is the earliest time represented in the RRD.  The RRD will not accept any times that predate (or equal) this value.  The time is frequently presented as a standard Unix epoch value (number of seconds since Jan 1, 1970 UTC) although it it will also parse other formats including “now”, “now – 1 hour”, and more.  (See the manual page for the at command for more details on the time format rrdtool supports.)  The default value is the current time minus 10 seconds.

step is the number of seconds each Primary Data Point represents.  The default value (300 seconds) provides a resolution of 1 PDP every 5 minutes.

Examples

Example 1

rrdtool create sysinfo.rrd --start now --step 300 \
DS:load:GAUGE:600:0:U \
RRA:AVERAGE:0.5:1:8928

This example creates a RRD with the filename of sysinfo.rrd.  The starting time is “now” which rrdtool will determine at the time the command is executed.  The step value for the Primary Data Points is 300 seconds.

The only Data Source defined in this RRD is labelled “load” and is of type GAUGE.  It has a heartbeat value of 600 seconds, a minimum value of 0 and an unlimited (“U”) maximum value.

There is only one Round Robin Archive defined, and it uses the AVERAGE  method to aggregate Primary Data Points into Consolidated Data Points.  It has an XFF of 0.5, which means that no more than 50% of the PDP’s can be of the “unknown” type or the CDP will also be of “unknown” type.  The RRA step value is 1, which in this case means there is a 1:1 correspondence between a PDP and a CDP.  (i.e. no aggregation is actually performed)  There are 8928 data rows in the RRA, so this archive represents a span of 31 days (8928 x 300 seconds).

Example 2

rrdtool create sysinfo.rrd --start now --step 300 \
DS:load:GAUGE:600:0:U \
RRA:AVERAGE:0.5:1:8928 \
RRA:AVERAGE:0.5:12:8760

This example is identical to Example 1 with the exception that it contains more than one RRA.

In this case, the first RRA defines a “high-resolution” view of the data but only for the last 31 days.  In many applications, the value of the data is highest for the most recent data, and so maintaining it with a high fidelity is warranted.

The second RRA defines a “low-resolution” view of the data for a substantially longer period of time.  In this example, the RRA step value is 12, so the AVERAGE Consolidation Function will be applied to 12 Primary Data Points to generate a single Consolidated Data Point.  As each PDP represents 300 seconds, each step in this RRA will represent 3600 seconds, or 1 hour (300s x 12).  With 8760 rows for this RRA, this archive represents a total time span of 1 year (365 days x 24 hours = 8760).

When generating graphs, rrdtool is smart enough to use the appropriate RRA(s) to provide the appropriate resolution for the time span requested.

Example 3

rrdtool create sysinfo.rrd --start now --step 300 \ DS:load:GAUGE:600:0:U \
DS:diskfree:DERIVE:600:U:U \
RRA:AVERAGE:0.5:1:8928 \
RRA:AVERAGE:0.5:12:8760

This example further extends the setup illustrated in Example 2.  In this case, an additional Data Source has been defined to track the rate of change in the free disk space.  As the amount of free disk space may increase or shrink, the DERIVE data type is required.  Like the GAUGE Data Source, it has a heartbeat value of 600 seconds but both the minimum and maximum values are unknown for this DS.  Note that the RRA’s apply to both of the Data Sources.