DataTorrent aims to make streaming into an application

by  for Big on Data, ZDNet

With the deluge of IoT data, streaming analytics is getting impossible to ignore. For streaming to break beyond early adopters, it has to become less of a do-it-yourself homegrown software development project.

There’s no shortage of use cases requiring real-time do-or-die decisions. Location-based marketing, border security, smart grid optimization, cybersecurity, and adtech auctions are a few of the many processes that rely on decisions made in the here and now. And with the explosion of IoT data come even more compelling use cases for taking action on data in the here and now.

Since our days covering middleware, we’ve been hearing that streaming or event processing is on the cusp. Sure, complex event processing was a technology looking for a solution. But as bandwidth, commodity hardware and, declining memory prices made it thinkable for Big Data to go real time, we started drinking the Kool-Aid. In 2012 we stated, “Fast Data, used in large enterprises for highly specialized needs, has become more affordable and available to the mainstream. Just when corporations absolutely need it.” And then in 2015: “Real-time streaming, machine learning, and search will become the most popular emerging workloads.”

But in the meantime, streaming did not exactly take the world by storm. Instead, at Ovum we’ve been getting a mouthful from our enterprise clients over machine learning and cloud deployment. Few clients are asking us about streaming.

It’s not that machine learning or cloud have been immune from hype. There’s certainly been no lack of that to go around. Yes, there have been plenty is headlines about whether AI and self-aware robots are going to take away our jobs. Beyond the hype, however, the results from machine learning and cloud are already tangible. Machine learning already powers a growing array of consumer services and analytic tools, while cloud deployment with major providers continues to spike upward.

The challenge with streaming is that it’s still hard for mere mortal organizations to implement, and for line of business people to understand. It’s not because of a shortage of streaming engines, related tooling, or potential use cases. It’s that streaming is still largely a custom development task. There is no such product as a streaming analytics for retail or network optimization product. Virtually every adopter must reinvent the perpetual motion wheel.

Yet that hasn’t slowed the proliferation of streaming engines that are now crowding a landscape that extends from the classic complex event processing engines to streaming, data flow management, and message queuing. So, not only is the technology raw, but there’s a bewildering array of options to choose from.

DataTorrent, one of many aspiring players, open sourced its technology as the Apache Apex project just over a year ago. As technology, Apex has differentiated in being sort of a middle ground: unlike Spark Streaming, which handles events in micro-batches, Apex is a true streaming engine, capable of handling individual events at a time. But the creators of Apex claim it works with less overhead and more flexibility than Storm, which tracks the state of every single event. And unique among streaming engines, Apex was engineered specifically for Hadoop with YARN support built in, rather than added on.

DataTorrent’s commercial product, the RTS platform, provides a visual development environment for configuring streams and piecing together streaming applications. And while Apex as open source streaming engine might not have the visibility of Spark Streaming, DataTorrent boasts a number of prominent logo customers like GE (which used Apex for its Predix IoT analytics platform) and Capital One. Operating largely under the radar, DataTorrent realized a doubling of business last year — of course, that’s coming from a very early stage company were the multiples should be steep.

Indirectly, as one of the outgrowths of Dell’s EMC acquisition, DataTorrent has a new management team, with the CEO and SVP of marketing. Incoming CEO Guy Churchward, who previously headed EMC’s storage division, is realistic that 2017 won’t be “the year” that streaming analytics breaks out. Churchward sees 2017 as a building year. There are a couple related challenges. First is broadening the Apex open source communities where for now, about three quarters of the committers are from DataTorrent. The company reports that the project has grown to 50,000 “members.” But that is not the same as committers and obviously does not represent actual production……..  Read the ZDNet article.

DCL:  I hate it when they say ” complex event processing was a technology looking for a solution.” No!  CEP was developed to solve real time event processing problems that existed at the time – 1990.  Intel had just wasted a few billion dollars on a costly chip manufacturing error because they did not have adequate technology to analyze the results of their event-driven simulations of chip designs. An error in a design was missed during simulation. As a result  a faulty chip was manufactured. Then of course the error because obvious resulting in a recall. Later on it was discovered that the error was in the simulation archives but had been totally missed during analysis of the simulation results. There were many other commercial instances at that time where CEP was needed – Tibco real time business processing for example. And of course fraud detection in banking transactions.  That is why CEP was invented at Stanford.  There were similar motivations in the work by Bates at Cambridge at that time too.

Leave a Reply