In this article we will install a monitoring solution on CentOS 7 consisting of CollectD, InfluxDB and Grafana.
The first step is to install CollectD:
yum -y install epel-release yum -y install collectd
At the time of writing, this resulted in the installation of CollectD version 5.5.0. If you are using an older version of CentOS, make sure that you do not install version 4 or lower.
The configuration can be tweaked to your liking, but should contain at least something like this in /etc/collectd.conf:
FQDNLookup true BaseDir "/var/lib/collectd" PIDFile "/var/run/collectd.pid" PluginDir "/usr/lib64/collectd" TypesDB "/usr/share/collectd/types.db" LoadPlugin syslog LoadPlugin interface LoadPlugin load LoadPlugin network <Plugin interface> Interface "eth0" IgnoreSelected false </Plugin> <Plugin load> ReportRelative true </Plugin> <Plugin network> Server "127.0.0.1" "25826" </Plugin>
The most important part is the “network” plugin, where we define that the measurements should be sent to 127.0.0.1 on port 25826. We will instruct InfluxDB shortly to listen on that port for incoming packets from CollectD.
Now start the daemon and make sure that it starts at boot as well:
systemctl start collectd.service systemctl enable collectd.service
Installation of InfluxDB can be achieved like this:
yum -y install http://influxdb.s3.amazonaws.com/influxdb-0.9.4.2-1.x86_64.rpm
Edit the InfluxDB configuration file /etc/opt/influxdb/influxdb.conf and change the lines near the [collectd] heading as follows:
[collectd] enabled = true bind-address = "127.0.0.1:25826" database = "collectd" typesdb = "/usr/share/collectd/types.db"
And then start the daemon:
systemctl start influxdb.service
You can now check if the measurements from CollectD are received by InfluxDB as follows:
# /opt/influxdb/influx Connected to http://localhost:8086 version 0.9.4.2 InfluxDB shell 0.9.4.2 > use collectd Using database collectd > show measurements name: measurements ------------------ name interface_rx interface_tx load_longterm load_midterm load_shortterm
The final step is installing and starting Grafana:
yum -y install https://grafanarel.s3.amazonaws.com/builds/grafana-2.5.0-1.x86_64.rpm systemctl start grafana-server.service systemctl enable grafana-server.service
Start a webbrowser and navigate to http://the_host_running_grafana:3000. Login with username admin and password admin. You will be presented with the following page:
Now click on Data Sources and then Add new. Fill in the resulting screen as follows:
The default user for InfluxDB is root and the password is also root. Test the connection and then save.
Click on Dashboards, then on the triangle pointing down next to Home, then on New:
The screen now shows a small green rectangle just below New dashboard. Hover over this rectangle and select Add Panel and then Graph:
A graph is now shown with test data. Click on the title of this graph and choose Edit. Click on — Grafana — and select collectd as the datasource:
An edit window now allows you to change the appearance and data of the graph. On the General tab, change the title of the graph into Load. On the Metrics tab, click on select measurement next to FROM and select load_longterm. Click on the plus sign next to WHERE and select host. Then click on select tag value and choose the name of the host you want to display the load from. Finally, change the value of ALIAS BY to Long:
Click on + Query to add similar queries for the mid term and short term load values. Finally, click on the save button on the top of the screen. The graph should now look like this:
For showing the network traffic statistics for our host, we will add a new graph to the dashboard. Click on + ADD ROW, then on the green rectangle and select Add Panel and Graph again. Select the collectd datasource. This time, we will not be able to use the visual query editor. The measurements from interface_rx and interface_tx are numbers that show the total number of packets or bytes, instead of the numbers per time unit. Click on the three horizontal lines on the right side of the graph editor and choose Switch editor mode:
Use the following text in the input box:
SELECT derivative("value") AS "value" FROM "interface_rx" WHERE "host" = 'test' AND "type" = 'if_octets' AND "instance" = 'eth0'
Make sure to change the value of host and instance to the correct values. Also make sure to use single quotes for the values and not double quotes. On the Axes & Grid tab, change the unit of Left Y to bytes per second:
Click on + Query to add a similar query for interface_tx. Add aliases for both queries: Received for interface_rx and Transmitted for interface_tx. Now the graph looks like this:
Nice clean how-to but it appears something might be missing on the network graph. You only indicated interface_rx but would we not also need interface_tx? Did you setup aliases for each query (Received and Transmitted). Did you do any grouping by time?
I ask because I get a much different graph even adding a 2nd query for interface_tx. The system I am testing this with is pretty much idle and the interface_tx shows between 0 and 1 kBs but the interface_rx shows between 26.5 and 27 kBs which seems a bit high (it is a KVM guest, however).
Yes, I’ve set up aliases for interface_rx (Received) and interface_tx (Transmitted). The text now says that you need to add a query for interface_tx as well.
Thanks – that helps. I did note that these queries are very time consuming without any additional filtering. How to construct the query also depends on how often you are collecting the interface statistics using collectd. In my case, stats are collected every 10 seconds. You also have dynamic time ranges within Grafana. What I needed to do was to further restrict the results based on the “$timeFilter”. I came up with something like this …
SELECT DERIVATIVE(MAX(value,10s)) as “value” FROM “interface_rx” WHERE “host” = ‘MYHOSTNAME’ AND “type” = ‘if_octets’ AND “instance” = ‘eth0’ AND $timeFilter GROUP BY time(10s)
Grafana actually provides the value for $timeFIlter at runtime and reconstructs the query like …
SELECT derivative(max(value, 10s)) AS “value” FROM “collectd_db”.”default”.interface_tx WHERE host = ‘MYHOSTNAME’ AND type = ‘if_octets’ AND instance = ‘eth0’ AND time > now() – 15m AND time < now() – 21s GROUP BY time(10s)
I'm still not convinced that this is actually correct as I sometime get a single result of ZERO when using the influxdb admin I/F to query the DB. These counters do rollover but I don't think that is the issue. Most likely my understanding of how derivative actually works is the culprit. You did get me on the right track, though. Thank you.
One last comment. My final query for RX was as follows …
SELECT DERIVATIVE(MAX(value,10s)) as “value” FROM “interface_rx” WHERE “host” = ‘MYHOSTNAME’ AND “type” = ‘if_octets’ AND “instance” = ‘eth0’ AND $timeFilter GROUP BY time($interval) fill(none)
This seems to always return the series and checking the values against other tools verifies that the scale is actually correct now. InfluxDB v0.9.4.
I upgraded to influxdb 0.9.5.1 and this changed how DERIVITAVE works, yet again. It appears to be fixed and it works as originally expected (no grouping on time required to just filter by time). I updated my queries as follows …
For Interface_rx and all four GlassFish servers …
SELECT DERIVATIVE(value) as “value” FROM “interface_rx” WHERE “host” =~ /.*nbts-gf.*/ AND “type” = ‘if_octets’ AND “instance” = ‘eth0’ AND $timeFilter GROUP BY host
For interface_tx and all four GlassFish servers …
SELECT DERIVATIVE(value) as “value” FROM “interface_tx” WHERE “host” =~ /.*nbts-gf.*/ AND “type” = ‘if_octets’ AND “instance” = ‘eth0’ AND $timeFilter GROUP BY host
The DERIVATIVE function has certainly been an adventure over the various influxdb releases.
After almost 2 days in a row, following this how-to, I managed to make this trinity-of-tools work.
I must admit that it was painstaking, especially because I have always worked with MRTG-like tools.
I still have a lot to do, but the foundations have been laid.
Thank you very much, indeed.
Pingback:Monitoring de serveur avec collectd, influxdb et grafana | yann.me | Yann Jajkiewicz
I’ve tried to follow the instructions, but it seems the newer version of influxdb has a slightly different configuration syntax… mine heads the collectd section with [[collectd]], not [collect].
It loads without error, but when I run influx and enter ‘show databases’, I only see the _internal database. I can manually create the db of course, but none of the table information is loaded. Any ideas what I’m missing?
oops, never mind my comment, seems to have been an selinux problem
This set of instructions are fantastic and worked on my second try . (I tried to configure the components on different servers initially, and ran into firewall issues with blocked ports – My bad).
Thank you for the precise instructions. The three stars are aligned and bootstrapped on Centos 7.2 🙂
I like this post – it is also possible to use InfluxDB and Grafana as a service e.g. https://corlysis.com/
Hey Glen,
could you tell me you solution?
I’m running in the same error, already checked all the config files, disabled SELinux but still there is no collectd database.
Any suggestions?
Pingback:Métricas de nuestro servidor con CollectD, InfluxDB y Grafana - ochobitshacenunbyte