Storm is a distributed, realtime computation system to reliably process unbounded streams of data. The following picture shows how data is processed in Storm:
This tutorial will show you how to install Storm on a cluster of CentOS hosts. A Storm cluster contains the following components:
Nimbus is the name for the master node. Nimbus is responsible for distributing code around the cluster, assigning tasks to machines, and monitoring for failures. The nodes that perform the work contain a supervisor and each supervisor is in control of one or more workers on that node. ZooKeeper is used for coordination between nimbus and the supervisors.
All nodes
We start with disabling SELinux and iptables on every host. This is a bad idea if you are running your cluster on publicly accessible machines, but makes it a lot easier to debug network problems. SELinux is enabled by default on CentOS. To disable it, we need to edit /etc/selinux/config:
SELINUX=disabled
We need to reboot the machine for this to take effect.
The firewall has some default rules we want to get rid of:
iptables --flush iptables --table nat --flush iptables --delete-chain iptables --table nat --delete-chain /etc/init.d/iptables save
Storm and ZooKeeper are both fail-fast systems, which means that a Storm or ZooKeeper process will kill itself as soon as an error is detected. It is therefore necessary to put the Storm and ZooKeeper processes under supervision. This will make sure that each process is restarted when needed. For supervision we will use supervisord. Installation is performed like this:
rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm yum install supervisor
ZooKeeper node
We will now create a single ZooKeeper node. Take a look at the ZooKeeper documentation to install a cluster.
yum -y install java-1.7.0-openjdk-devel wget cd /opt wget http://apache.xl-mirror.nl/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz tar zxvf zookeeper-3.4.5.tar.gz mkdir /var/zookeeper cp zookeeper-3.4.5/conf/zoo_sample.cfg zookeeper-3.4.5/conf/zoo.cfg
Now edit the zookeeper-3.4.5/conf/zoo.cfg file:
dataDir=/var/zookeeper
Edit the /etc/supervisord.conf file and add a section about ZooKeeper to it:
[program:zookeeper] command=/opt/zookeeper-3.4.5/bin/zkServer.sh start-foreground autostart=true autorestart=true startsecs=1 startretries=999 redirect_stderr=false stdout_logfile=/var/log/zookeeper-out stdout_logfile_maxbytes=10MB stdout_logfile_backups=10 stdout_events_enabled=true stderr_logfile=/var/log/zookeeper-err stderr_logfile_maxbytes=100MB stderr_logfile_backups=10 stderr_events_enabled=true
Start the supervision and thereby the ZooKeeper service:
chkconfig supervisord on service supervisord start
Running the supervisorctl command should result in something like this:
zookeeper RUNNING pid 1115, uptime 1 day, 0:07:33
Nimbus and Supervisor nodes
Every Storm node has a set of dependencies that need to be satisfied. We start with ZeroMQ and JZMQ:
yum -y install gcc gcc-c++ libuuid-devel make wget cd /opt wget http://download.zeromq.org/zeromq-2.2.0.tar.gz tar zxvf zeromq-2.2.0.tar.gz cd zeromq-2.2.0 ./configure make install ldconfig yum -y install java-1.7.0-openjdk-devel unzip libtool export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.9.x86_64 cd /opt wget https://github.com/nathanmarz/jzmq/archive/master.zip -O jzmq-master.zip unzip jzmq-master.zip cd jzmq-master ./autogen.sh ./configure make install
Then we move onto Storm itself:
cd /opt wget https://github.com/downloads/nathanmarz/storm/storm-0.8.1.zip unzip storm-0.8.1.zip mkdir /var/storm
Now edit the storm-0.8.1/conf/storm.yaml file, replacing the IP addresses as needed:
storm.zookeeper.servers: - "10.20.30.40" nimbus.host: "10.20.30.41" storm.local.dir: "/var/storm"
Finally we edit the supervision configuration file /etc/supervisord.conf:
[program:storm_nimbus] command=/opt/storm-0.8.1/bin/storm nimbus autostart=true autorestart=true startsecs=1 startretries=999 redirect_stderr=false stdout_logfile=/var/log/storm-nimbus-out stdout_logfile_maxbytes=10MB stdout_logfile_backups=10 stdout_events_enabled=true stderr_logfile=/var/log/storm-nimbus-err stderr_logfile_maxbytes=100MB stderr_logfile_backups=10 stderr_events_enabled=true [program:storm_ui] command=/opt/storm-0.8.1/bin/storm ui autostart=true autorestart=true startsecs=1 startretries=999 redirect_stderr=false stdout_logfile=/var/log/storm-ui-out stdout_logfile_maxbytes=10MB stdout_logfile_backups=10 stdout_events_enabled=true stderr_logfile=/var/log/storm-ui-err stderr_logfile_maxbytes=100MB stderr_logfile_backups=10 stderr_events_enabled=true
And start the supervision:
chkconfig supervisord on service supervisord start
Running the supervisorctl command should result in something like this:
storm_nimbus RUNNING pid 1119, uptime 1 day, 0:20:14 storm_ui RUNNING pid 1121, uptime 1 day, 0:20:14
The Storm UI should now be accessible. Point a webbrowser at port 8080 on the Nimbus host, and you should get something like this:
Note that the screenshot also shows an active topology, which will not be available if you just followed the steps in this tutorial and haven’t deployed a topology to the cluster yet.
Fantastische guide-through! Hij heeft me erg geholpen 😉
Awesome set of concise instructions.. These set of instructions work very well with AMI’s as well
Awesome tutorial. It would be awesome to get a followup on setting up the wordcount topology or something.
Since this is probably the only proper article on setting up storm in CentOS, you should probably update it to storm 0.8.2. The latest download isn’t available on Github though. [http://storm-project.net/downloads.html](http://storm-project.net/downloads.html) lists the latest download links (hosted on Dropbox)
This is very helpful blog on storm cluster. Ihave also written a blog on storm cluster. This may be helpful.
Pingback:Apache Storm Clusters Deployment Guide
This is the only tutorial that got me all the way through. With the others I tried, I struggled to get zookeeper installed and configured. It’s unfortunate there doesn’t seem to be a package for handling zookeeper a bit easier.
The only problem I found was that having zookeeper use the start-foreground command resulted in an error. When i use supervisorctl, nimbus and ui say running, but zookeeper says BACKOFF. A bit of googling suggested to change start-foreground to just ‘start’. I did that, now zookeeper says STARTING, but never get around to saying its RUNNING. Not sure if thats normal or not.
While starting nimbus I am getting below error . Log from supervisord.log
017-07-14 10:00:03,851 CRIT Supervisor running as root (no user in config file)
2017-07-14 10:00:03,851 WARN No file matches via include “/etc/supervisord.d/*.ini”
2017-07-14 10:00:03,875 INFO RPC interface ‘supervisor’ initialized
2017-07-14 10:00:03,875 CRIT Server ‘unix_http_server’ running without any HTTP authentication checking
2017-07-14 10:00:03,876 INFO daemonizing the supervisord process
2017-07-14 10:00:03,876 INFO supervisord started with pid 22852
2017-07-14 10:00:04,879 INFO spawned: ‘storm-nimbus’ with pid 22853
2017-07-14 10:00:12,902 INFO exited: storm-nimbus (terminated by SIGABRT; not expected)
2017-07-14 10:00:13,905 INFO spawned: ‘storm-nimbus’ with pid 22950
2017-07-14 10:00:21,816 INFO exited: storm-nimbus (terminated by SIGABRT; not expected)
2017-07-14 10:00:23,820 INFO spawned: ‘storm-nimbus’ with pid 23049
2017-07-14 10:00:31,862 INFO exited: storm-nimbus (terminated by SIGABRT; not expected)
2017-07-14 10:00:34,867 INFO spawned: ‘storm-nimbus’ with pid 23148
2017-07-14 10:00:43,018 INFO exited: storm-nimbus (terminated by SIGABRT; not expected)
2017-07-14 10:00:47,025 INFO spawned: ‘storm-nimbus’ with pid 23274
2017-07-14 10:00:55,091 INFO exited: storm-nimbus (terminated by SIGABRT; not expected)
2017-07-14 10:01:00,638 INFO spawned: ‘storm-nimbus’ with pid 23395
2017-07-14 10:01:07,129 INFO stopped: storm-nimbus (terminated by SIGTERM)
My Program :
[program:storm-nimbus]
command=/opt/storm/bin/storm nimbus
user=storm
autostart=true
autorestart=true
startsecs=10
startretries=999
log_stdout=true
log_stderr=true
logfile=/var/log/storm/nimbus.out
logfile_maxbytes=20MB
logfile_backups=10