Storm is an open source Apache project (storm.apache.org) that allows real-time distributed computations to be performed over streams of data.
It’s part of the Hadoop ecosystem of Big Data processing tools and is directly supported in HDInsight.
For example, the detection of terms related to a storm or earthquake might provide a very rapid indication of areas affected by a natural disaster and its severity.
To demonstrate the basics of how this is done, I’ll walk through how to set up a streaming topology that collects data from Twitter, selects some of the tweets, calculates metrics, saves everything into storage and publishes some of the results. For this article, I selected tweets using simple keyword matching.
As you’ll see, Microsoft makes this kind of development significantly easier than other current market offerings via powerful authoring and debugging tools in Visual Studio.
The HDInsight Tools for Visual Studio (available as part of the Azure SDK) provides a coding and debugging environment that’s familiar to . These tools offer a significantly easier way to work with Big Data technologies than the simple editors and command-line tools currently available in the open source world.
Topologies don’t finish like other queries—they continue to execute until they’re suspended or killed.
In the Azure management portal, you can create a new HDInsight cluster and choose Storm as the type.
Such a graph is referred to as a “topology” in Storm.Spouts produce streams of tuples, which are basically sets of type and value pairs.In other words, a spout is a piece of code that knows how to collect or generate data and then emit it in chunks.The Microsoft Azure platform provides powerful Big Data solutions, including Azure Data Lake and HDInsight.There’s an open source technology that allows highly distributed real-time analytics called Apache Storm.