In the midst of Hadoop FUD, the company governing the technology continues to plod on. Last week, in a manoeuvre aimed at the commercial battleground that is the Internet of Things, Hortonworks announced that it was implementing a new generation data analysis tool: ‘Hortonworks DataFlow (HDF). The “data-in-motion” technology will be offered as a parallel subscription to data-at-rest Hortonworks Data Platform.

However, for Apurva Dave, VP of Marketing at operations data hub Jut, there’s more than just buzzy future thinking at work here. From the perspective of Jut, who have been building their own dataflow analytics system, the technology represents the next generation of data analytics. As Dave puts it, “ It’s interesting to see the evolution of these big data companies to where the world is going anyway.”

He adds, “When you think about what DataFlow is, the simplest way to describe it is that the old generation of data – specifically MapReduce – was all about batch analytics. And when you look at DataFlow, it does real time stream analytics and batch analytics equally as well. And what’s more important than that is it does it with the same framework and language context.”

From a programmer’s perspective, this is a huge plus. In the past, developers have been forced to pivot from one foot to another when switching between batch and real time analytics. “Two separate implementations, two separate products, two separate ways to think. And what DataFlow is doing is bringing those together. With it in place, Dave comments, these two disparate strands are being woven together.

The top line benefit of this tool, he says, is now, versus a week from now. “When do you want the answer to your question? And how many times in your organisation is it feasible to have the answer a week from now, versus right now? Streaming analytics is about having answers right now. And that’s especially critical when you deal with areas of the business that are real time. For example any service that’s happening online.”

And then there’s that all important IoT market – the one that Hortonworks is very publically pushing on the case of this addition to their services. Indeed, it makes a lot of sense when you’ve got thousands of millions of sensors streaming data back to your organisation. The primary benefit for business here, Dave notes, is that the world is moving to real time, and “the old paradigm of MapReduce simply can’t keep up with that.”

However, whilst Dave thinks it’s true that Hortonworks is trying to capture emerging markets and apply dataflow to it, in reality, he believes the technology is, “just as applicable to a lot of the day in, day out work that enterprises need to do in order to make their software reliable and work the way their customers expect it to.”

DataFlow, and Respecting Programmer Sanity

About The Author
- Editor of, focusing on all things Java, JVM, cloud-y, methodical, future-fantastic, and everything in between. Got a piece of news, article or tutorial you'd like to share with your fellow Voxxians? Drop us a line at

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>