Storm is a distributed, realtime computation system to reliably process unbounded streams of data. The following picture shows how data is processed in Storm:

storm-processing

This tutorial will show you how to install Storm on a cluster of CentOS hosts. A Storm cluster contains the following components:

storm-cluster

Nimbus is the name for the master node. Nimbus is responsible for distributing code around the cluster, assigning tasks to machines, and monitoring for failures. The nodes that perform the work contain a supervisor and each supervisor is in control of one or more workers on that node. ZooKeeper is used for coordination between nimbus and the supervisors.

All nodes

We start with disabling SELinux and iptables on every host. This is a bad idea if you are running your cluster on publicly accessible machines, but makes it a lot easier to debug network problems. SELinux is enabled by default on CentOS. To disable it, we need to edit /etc/selinux/config:

SELINUX=disabled

We need to reboot the machine for this to take effect.

The firewall has some default rules we want to get rid of:

iptables --flush
iptables --table nat --flush
iptables --delete-chain
iptables --table nat --delete-chain
/etc/init.d/iptables save

Storm and ZooKeeper are both fail-fast systems, which means that a Storm or ZooKeeper process will kill itself as soon as an error is detected. It is therefore necessary to put the Storm and ZooKeeper processes under supervision. This will make sure that each process is restarted when needed. For supervision we will use supervisord. Installation is performed like this:

rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm
yum install supervisor

ZooKeeper node

We will now create a single ZooKeeper node. Take a look at the ZooKeeper documentation to install a cluster.

yum -y install java-1.7.0-openjdk-devel wget
cd /opt
wget http://apache.xl-mirror.nl/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz
tar zxvf zookeeper-3.4.5.tar.gz
mkdir /var/zookeeper
cp zookeeper-3.4.5/conf/zoo_sample.cfg zookeeper-3.4.5/conf/zoo.cfg

Now edit the zookeeper-3.4.5/conf/zoo.cfg file:

dataDir=/var/zookeeper

Edit the /etc/supervisord.conf file and add a section about ZooKeeper to it:

[program:zookeeper]
command=/opt/zookeeper-3.4.5/bin/zkServer.sh start-foreground
autostart=true
autorestart=true
startsecs=1
startretries=999
redirect_stderr=false
stdout_logfile=/var/log/zookeeper-out
stdout_logfile_maxbytes=10MB
stdout_logfile_backups=10
stdout_events_enabled=true
stderr_logfile=/var/log/zookeeper-err
stderr_logfile_maxbytes=100MB
stderr_logfile_backups=10
stderr_events_enabled=true

Start the supervision and thereby the ZooKeeper service:

chkconfig supervisord on
service supervisord start

Running the supervisorctl command should result in something like this:

zookeeper      RUNNING    pid 1115, uptime 1 day, 0:07:33

Nimbus and Supervisor nodes

Every Storm node has a set of dependencies that need to be satisfied. We start with ZeroMQ and JZMQ:

yum -y install gcc gcc-c++ libuuid-devel make wget
cd /opt
wget http://download.zeromq.org/zeromq-2.2.0.tar.gz
tar zxvf zeromq-2.2.0.tar.gz
cd zeromq-2.2.0
./configure
make install
ldconfig

yum -y install java-1.7.0-openjdk-devel unzip libtool
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.9.x86_64
cd /opt
wget https://github.com/nathanmarz/jzmq/archive/master.zip -O jzmq-master.zip
unzip jzmq-master.zip
cd jzmq-master
./autogen.sh
./configure
make install

Then we move onto Storm itself:

cd /opt
wget https://github.com/downloads/nathanmarz/storm/storm-0.8.1.zip
unzip storm-0.8.1.zip
mkdir /var/storm

Now edit the storm-0.8.1/conf/storm.yaml file, replacing the IP addresses as needed:

storm.zookeeper.servers:
 - "10.20.30.40"
nimbus.host: "10.20.30.41"
storm.local.dir: "/var/storm"

Finally we edit the supervision configuration file /etc/supervisord.conf:

[program:storm_nimbus]
command=/opt/storm-0.8.1/bin/storm nimbus
autostart=true
autorestart=true
startsecs=1
startretries=999
redirect_stderr=false
stdout_logfile=/var/log/storm-nimbus-out
stdout_logfile_maxbytes=10MB
stdout_logfile_backups=10
stdout_events_enabled=true
stderr_logfile=/var/log/storm-nimbus-err
stderr_logfile_maxbytes=100MB
stderr_logfile_backups=10
stderr_events_enabled=true

[program:storm_ui]
command=/opt/storm-0.8.1/bin/storm ui
autostart=true
autorestart=true
startsecs=1
startretries=999
redirect_stderr=false
stdout_logfile=/var/log/storm-ui-out
stdout_logfile_maxbytes=10MB
stdout_logfile_backups=10
stdout_events_enabled=true
stderr_logfile=/var/log/storm-ui-err
stderr_logfile_maxbytes=100MB
stderr_logfile_backups=10
stderr_events_enabled=true

And start the supervision:

chkconfig supervisord on
service supervisord start

Running the supervisorctl command should result in something like this:

storm_nimbus   RUNNING    pid 1119, uptime 1 day, 0:20:14
storm_ui       RUNNING    pid 1121, uptime 1 day, 0:20:14

The Storm UI should now be accessible. Point a webbrowser at port 8080 on the Nimbus host, and you should get something like this:

storm-ui

Note that the screenshot also shows an active topology, which will not be available if you just followed the steps in this tutorial and haven’t deployed a topology to the cluster yet.

Installing a Storm cluster on CentOS hosts
Tagged on:         

8 thoughts on “Installing a Storm cluster on CentOS hosts

  • 2013-05-02 at 11:40
    Permalink

    Fantastische guide-through! Hij heeft me erg geholpen 😉

    Reply
  • 2013-05-27 at 19:42
    Permalink

    Awesome set of concise instructions.. These set of instructions work very well with AMI’s as well

    Reply
  • 2013-05-31 at 18:45
    Permalink

    Awesome tutorial. It would be awesome to get a followup on setting up the wordcount topology or something.

    Reply
  • 2013-07-30 at 09:55
    Permalink

    Since this is probably the only proper article on setting up storm in CentOS, you should probably update it to storm 0.8.2. The latest download isn’t available on Github though. [http://storm-project.net/downloads.html](http://storm-project.net/downloads.html) lists the latest download links (hosted on Dropbox)

    Reply
  • 2014-01-14 at 12:10
    Permalink

    This is very helpful blog on storm cluster. Ihave also written a blog on storm cluster. This may be helpful.

    Reply
  • Pingback:Apache Storm Clusters Deployment Guide

  • 2015-07-09 at 23:33
    Permalink

    This is the only tutorial that got me all the way through. With the others I tried, I struggled to get zookeeper installed and configured. It’s unfortunate there doesn’t seem to be a package for handling zookeeper a bit easier.

    The only problem I found was that having zookeeper use the start-foreground command resulted in an error. When i use supervisorctl, nimbus and ui say running, but zookeeper says BACKOFF. A bit of googling suggested to change start-foreground to just ‘start’. I did that, now zookeeper says STARTING, but never get around to saying its RUNNING. Not sure if thats normal or not.

    Reply
  • 2017-07-14 at 11:14
    Permalink

    While starting nimbus I am getting below error . Log from supervisord.log

    017-07-14 10:00:03,851 CRIT Supervisor running as root (no user in config file)
    2017-07-14 10:00:03,851 WARN No file matches via include “/etc/supervisord.d/*.ini”
    2017-07-14 10:00:03,875 INFO RPC interface ‘supervisor’ initialized
    2017-07-14 10:00:03,875 CRIT Server ‘unix_http_server’ running without any HTTP authentication checking
    2017-07-14 10:00:03,876 INFO daemonizing the supervisord process
    2017-07-14 10:00:03,876 INFO supervisord started with pid 22852
    2017-07-14 10:00:04,879 INFO spawned: ‘storm-nimbus’ with pid 22853
    2017-07-14 10:00:12,902 INFO exited: storm-nimbus (terminated by SIGABRT; not expected)
    2017-07-14 10:00:13,905 INFO spawned: ‘storm-nimbus’ with pid 22950
    2017-07-14 10:00:21,816 INFO exited: storm-nimbus (terminated by SIGABRT; not expected)
    2017-07-14 10:00:23,820 INFO spawned: ‘storm-nimbus’ with pid 23049
    2017-07-14 10:00:31,862 INFO exited: storm-nimbus (terminated by SIGABRT; not expected)
    2017-07-14 10:00:34,867 INFO spawned: ‘storm-nimbus’ with pid 23148
    2017-07-14 10:00:43,018 INFO exited: storm-nimbus (terminated by SIGABRT; not expected)
    2017-07-14 10:00:47,025 INFO spawned: ‘storm-nimbus’ with pid 23274
    2017-07-14 10:00:55,091 INFO exited: storm-nimbus (terminated by SIGABRT; not expected)
    2017-07-14 10:01:00,638 INFO spawned: ‘storm-nimbus’ with pid 23395
    2017-07-14 10:01:07,129 INFO stopped: storm-nimbus (terminated by SIGTERM)

    My Program :

    [program:storm-nimbus]
    command=/opt/storm/bin/storm nimbus
    user=storm
    autostart=true
    autorestart=true
    startsecs=10
    startretries=999
    log_stdout=true
    log_stderr=true
    logfile=/var/log/storm/nimbus.out
    logfile_maxbytes=20MB
    logfile_backups=10

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *