Monitoring hosts with CollectD, InfluxDB and Grafana

In this article we will install a monitoring solution on CentOS 7 consisting of CollectD, InfluxDB and Grafana.

The first step is to install CollectD:

yum -y install epel-release
yum -y install collectd

At the time of writing, this resulted in the installation of CollectD version 5.5.0. If you are using an older version of CentOS, make sure that you do not install version 4 or lower.

The configuration can be tweaked to your liking, but should contain at least something like this in /etc/collectd.conf:

FQDNLookup  true
BaseDir     "/var/lib/collectd"
PIDFile     "/var/run/collectd.pid"
PluginDir   "/usr/lib64/collectd"
TypesDB     "/usr/share/collectd/types.db"
LoadPlugin  syslog
LoadPlugin  interface
LoadPlugin  load
LoadPlugin  network
<Plugin interface>
    Interface "eth0"
    IgnoreSelected false
</Plugin>
<Plugin load>
    ReportRelative true
</Plugin>
<Plugin network>
    Server "127.0.0.1" "25826"
</Plugin>

The most important part is the “network” plugin, where we define that the measurements should be sent to 127.0.0.1 on port 25826. We will instruct InfluxDB shortly to listen on that port for incoming packets from CollectD.

Now start the daemon and make sure that it starts at boot as well:

systemctl start collectd.service
systemctl enable collectd.service

Installation of InfluxDB can be achieved like this:

yum -y install http://influxdb.s3.amazonaws.com/influxdb-0.9.4.2-1.x86_64.rpm

Edit the InfluxDB configuration file /etc/opt/influxdb/influxdb.conf and change the lines near the [collectd] heading as follows:

[collectd]
    enabled = true
    bind-address = "127.0.0.1:25826"
    database = "collectd"
    typesdb = "/usr/share/collectd/types.db"

And then start the daemon:

systemctl start influxdb.service

You can now check if the measurements from CollectD are received by InfluxDB as follows:

# /opt/influxdb/influx
Connected to http://localhost:8086 version 0.9.4.2
InfluxDB shell 0.9.4.2
> use collectd
Using database collectd
> show measurements
name: measurements
------------------
name
interface_rx
interface_tx
load_longterm
load_midterm
load_shortterm

The final step is installing and starting Grafana:

yum -y install https://grafanarel.s3.amazonaws.com/builds/grafana-2.5.0-1.x86_64.rpm
systemctl start grafana-server.service
systemctl enable grafana-server.service

Start a webbrowser and navigate to http://the_host_running_grafana:3000. Login with username admin and password admin. You will be presented with the following page:

grafana1

Now click on Data Sources and then Add new. Fill in the resulting screen as follows:

grafana2

The default user for InfluxDB is root and the password is also root. Test the connection and then save.

Click on Dashboards, then on the triangle pointing down next to Home, then on New:

grafana3

The screen now shows a small green rectangle just below New dashboard. Hover over this rectangle and select Add Panel and then Graph:

grafana4

A graph is now shown with test data. Click on the title of this graph and choose Edit. Click on — Grafana — and select collectd as the datasource:

grafana5

An edit window now allows you to change the appearance and data of the graph. On the General tab, change the title of the graph into Load. On the Metrics tab, click on select measurement next to FROM and select load_longterm. Click on the plus sign next to WHERE and select host. Then click on select tag value and choose the name of the host you want to display the load from. Finally, change the value of ALIAS BY to Long:

grafana6

Click on + Query to add similar queries for the mid term and short term load values. Finally, click on the save button on the top of the screen. The graph should now look like this:

grafana7

For showing the network traffic statistics for our host, we will add a new graph to the dashboard. Click on + ADD ROW, then on the green rectangle and select Add Panel and Graph again. Select the collectd datasource. This time, we will not be able to use the visual query editor. The measurements from interface_rx and interface_tx are numbers that show the total number of packets or bytes, instead of the numbers per time unit. Click on the three horizontal lines on the right side of the graph editor and choose Switch editor mode:

grafana8

Use the following text in the input box:

SELECT derivative("value") AS "value" FROM "interface_rx" WHERE "host" = 'test' AND "type" = 'if_octets' AND "instance" = 'eth0'

Make sure to change the value of host and instance to the correct values. Also make sure to use single quotes for the values and not double quotes. On the Axes & Grid tab, change the unit of Left Y to bytes per second:

grafana9

Click on + Query to add a similar query for interface_tx. Add aliases for both queries: Received for interface_rx and Transmitted for interface_tx. Now the graph looks like this:

grafana10

11 thoughts on “Monitoring hosts with CollectD, InfluxDB and Grafana

  1. Permalink  ⋅ Reply

    Kipland Iles

    December 3, 2015 at 4:43pm

    Nice clean how-to but it appears something might be missing on the network graph. You only indicated interface_rx but would we not also need interface_tx? Did you setup aliases for each query (Received and Transmitted). Did you do any grouping by time?

    I ask because I get a much different graph even adding a 2nd query for interface_tx. The system I am testing this with is pretty much idle and the interface_tx shows between 0 and 1 kBs but the interface_rx shows between 26.5 and 27 kBs which seems a bit high (it is a KVM guest, however).

    • Permalink  ⋅ Reply

      admin

      December 3, 2015 at 4:55pm

      Yes, I’ve set up aliases for interface_rx (Received) and interface_tx (Transmitted). The text now says that you need to add a query for interface_tx as well.

      • Permalink  ⋅ Reply

        Kipland Iles

        December 4, 2015 at 7:25pm

        Thanks – that helps. I did note that these queries are very time consuming without any additional filtering. How to construct the query also depends on how often you are collecting the interface statistics using collectd. In my case, stats are collected every 10 seconds. You also have dynamic time ranges within Grafana. What I needed to do was to further restrict the results based on the “$timeFilter”. I came up with something like this …

        SELECT DERIVATIVE(MAX(value,10s)) as “value” FROM “interface_rx” WHERE “host” = ‘MYHOSTNAME’ AND “type” = ‘if_octets’ AND “instance” = ‘eth0’ AND $timeFilter GROUP BY time(10s)

        Grafana actually provides the value for $timeFIlter at runtime and reconstructs the query like …

        SELECT derivative(max(value, 10s)) AS “value” FROM “collectd_db”.”default”.interface_tx WHERE host = ‘MYHOSTNAME’ AND type = ‘if_octets’ AND instance = ‘eth0’ AND time > now() – 15m AND time < now() – 21s GROUP BY time(10s)

        I'm still not convinced that this is actually correct as I sometime get a single result of ZERO when using the influxdb admin I/F to query the DB. These counters do rollover but I don't think that is the issue. Most likely my understanding of how derivative actually works is the culprit. You did get me on the right track, though. Thank you.

        • Permalink  ⋅ Reply

          Kipland Iles

          December 4, 2015 at 10:02pm

          One last comment. My final query for RX was as follows …

          SELECT DERIVATIVE(MAX(value,10s)) as “value” FROM “interface_rx” WHERE “host” = ‘MYHOSTNAME’ AND “type” = ‘if_octets’ AND “instance” = ‘eth0’ AND $timeFilter GROUP BY time($interval) fill(none)

          This seems to always return the series and checking the values against other tools verifies that the scale is actually correct now. InfluxDB v0.9.4.

      • Permalink  ⋅ Reply

        Kipland Iles

        December 6, 2015 at 8:54pm

        I upgraded to influxdb 0.9.5.1 and this changed how DERIVITAVE works, yet again. It appears to be fixed and it works as originally expected (no grouping on time required to just filter by time). I updated my queries as follows …

        For Interface_rx and all four GlassFish servers …
        SELECT DERIVATIVE(value) as “value” FROM “interface_rx” WHERE “host” =~ /.*nbts-gf.*/ AND “type” = ‘if_octets’ AND “instance” = ‘eth0’ AND $timeFilter GROUP BY host

        For interface_tx and all four GlassFish servers …
        SELECT DERIVATIVE(value) as “value” FROM “interface_tx” WHERE “host” =~ /.*nbts-gf.*/ AND “type” = ‘if_octets’ AND “instance” = ‘eth0’ AND $timeFilter GROUP BY host

        The DERIVATIVE function has certainly been an adventure over the various influxdb releases.

  2. Permalink  ⋅ Reply

    Andres Velandia

    July 29, 2016 at 10:55pm

    After almost 2 days in a row, following this how-to, I managed to make this trinity-of-tools work.
    I must admit that it was painstaking, especially because I have always worked with MRTG-like tools.
    I still have a lot to do, but the foundations have been laid.
    Thank you very much, indeed.

  3. Permalink  ⋅ Reply

    glen

    January 13, 2017 at 4:48pm

    I’ve tried to follow the instructions, but it seems the newer version of influxdb has a slightly different configuration syntax… mine heads the collectd section with [[collectd]], not [collect].

    It loads without error, but when I run influx and enter ‘show databases’, I only see the _internal database. I can manually create the db of course, but none of the table information is loaded. Any ideas what I’m missing?

  4. Permalink  ⋅ Reply

    glen

    January 13, 2017 at 5:15pm

    oops, never mind my comment, seems to have been an selinux problem

  5. Permalink  ⋅ Reply

    Srikanth

    February 21, 2017 at 6:09am

    This set of instructions are fantastic and worked on my second try . (I tried to configure the components on different servers initially, and ran into firewall issues with blocked ports – My bad).
    Thank you for the precise instructions. The three stars are aligned and bootstrapped on Centos 7.2 🙂

Leave a Reply

Your email will not be published. Name and Email fields are required.