I presented a paper called Dynamically Scaling Apache Storm for the Analysis of Streaming Data at the 1st International Conference on Big Data Service (IEEE BigDataService 2015).
Stream processing platforms allow applications to analyse incoming data continuously. Several use cases exist that make use of these capabilities, ranging from monitoring of physical infrastructures to pre selecting video surveillance feeds for human inspection. It is difficult to predict how much computing resources are needed for these stream processing platforms, because the volume and velocity of input data may vary over time. The open source Apache Storm software provides a framework for developers to build processing applications that use the computing resources of all machines within an established cluster. Because of the varying processing needs of such applications, the platform should be able to automatically grow and shrink as needed. Unfortunately, the current Storm platform does not provide this capability. In this paper we describe the design and implementation of a tool that monitors several aspects of the Storm platform, the applications running on top of it, and external systems such as queues and databases. Based on this information, the tool decides whether extra servers are needed or machines may be decommissioned from the cluster, and then acts upon it by actually creating new virtual machines or shutting them down.