Wide Range of Products for Event Stream Processing

9 February 2019
W. Roy Schulte

This article represents my personal opinion, not necessarily that of my employer, Gartner inc., or anyone else. For official Gartner information, go to www.Gartner.com.

Every company is blessed with – and challenged by – the exploding volume of streaming data that flows through its corporate networks, or that are potentially available from external sources. The fastest growing kind of streaming data consists of events emitted by sensors, meters and control systems on physical devices (often called Internet of Things data). But there are lots of other kinds of event streams, including web clickstreams from customer interactions; social media activity streams such as tweets, Facebook posts, Snapchats and Linked-in updates; market data; weather data; and, as always, copies of business transactions from transaction processing applications.

This is a blessing because event streams hold information that can be used to improve business (and personal) decisions. However, it is also a challenge because many IT departments and analytics and business intelligence (BI) teams have limited experience in building or acquiring systems that can exploit streaming data. Among other issues, they may not be clear on what kind of software to use to enable such systems. Other articles on www.complexevents.com describe how companies capture the benefits of stream processing. This article summarizes five categories of software that are used to process the streams.

Before we get into the categories, it will help to mention that companies process event streams in two fundamental ways:

• They perform stream analytics (or other application logic) on event data in real-time or near-real-time as it arrives, or very shortly thereafter. Stream (or “streaming”) analytics generally involves a sequence of operations such as filtering (selecting), transforming, correlating (joining), computing aggregates (e.g., count, sum, average, maximum, minimum) and finding new instances of matches to pattern templates. It is used to provide situation awareness to people through dashboards, alerts and mobile applications; or to trigger automated responses (sense and respond) when the system detects conditions that require some sort of pre-designed response.

• Alternatively, they ingest, process and then store the data in a file or database to be processed by an application or analytics tool at a later time. This is called stream data integration (SDI) or, sometimes, “real-time ETL.” Systems that perform SDI will typically filter, transform, and enrich the data before storing it. The systems may incorporate adapters for various DBMSs, file systems and messaging systems such as FTL, Kafka, Kinesis, MQSeries,Pulsar, RabbitMQ or others. In some cases, SDI systems are used to move data from one file or database to another in bulk rather than capturing a live event stream.

With that background we can briefly describe five kinds of software that process event streams:

1. Purpose-specific applications
2. Commercial event stream processing (ESP) platforms
3. Open source streaming frameworks
4. Stream data integration (SDI) platforms
5. Stream analytics platforms with integrated data stores

Descriptions

1. Purpose-specific applications

Most processing of event streams has always been done in vertically- or horizontally-specialized custom-built or commercial off-the-shelf (COTS) application packages or SaaS offerings, rather than in tailorable ESP platforms. That is, companies use purpose-built applications for supply chain visibility, security information and event management (SIEM), fraud detection, real-time customer relationship management (CRM) offers, fleet management, call center monitoring and many other business problems. These systems do event stream processing, but they are hardwired to handle only certain kinds of event data and they can detect only the event patterns that apply to their particular business problem. The vendors that sell such applications or provide SaaS solutions, and companies that use them, categorize them by the name of their business function. Few people think of these as “event stream processing” products.

Purpose-specific COTS or SaaS offerings are often the best solutions for common business problems. However, they don’t work for situations that have unique business requirements, or that require custom connections among multiple independently-designed COTS, SaaS or legacy systems. There simply are no available products for many purposes. So, for many situations, companies do have to build or buy customized ESP applications. If they are building a customized ESP application, it is usually faster and simpler to start with one of 4 categories of tailorable stream processing software products described below, rather than hand coding the basic framework for manipulating streaming data in addition to the application logic.

2. Commercial Event Stream Processing (ESP) Platforms

When someone says ESP, they are most-often referring to the many commercial ESP platforms that process data in motion as it arrives. These are general purpose products, used for either or both stream analytics or stream data integration (SDI). The internal architectures of these products vary greatly, so some are better than others at holding state or handling complicated event processing strategies that involve joining multiple streams, long event windows, out-of-order events, database lookups or other characteristics. Some of these products are hybrids of Apache open source code (see next category) and proprietary “enterprise” extensions (hybrid products require license fees or paid subscriptions but are sometimes mostly open source). In more than half of their applications, these products are run on-premises or in a private cloud, but they run on public clouds in a growing minority of situations.

Examples

• Amazon Firehose, Lambda, Kinesis Analytics
• Concord Systems Concord
• Confluent Platform (with Apache Kafka)
• (Alibaba) dataArtisans Flink (with Apache Beam and Apache Flink)
• Databricks Platform (with Spark Streaming)
• EsperTech Esper, EsperTech Nesper
• EVAM (Event and Action Manager)
• Fujitsu Software Interstage Big Data Complex Event Processing Server
• Google Cloud Dataflow (with Apache Beam)
• Hitachi uCosminexus Stream Data Platform
• Hortonworks Data Flow (with Apache Nifi, Apache Kafka, Apache Storm)
• IBM Streams
• Impetus StreamAnalytix (with Apache Storm, Apache Spark or Apache Flink)
• Keen.IO Streams, Compute, Access, Data Explorer
• LG CNS EventPro
• MapR Converged Data Platform with Streams
• Microsoft Azure Stream Analytics, Stream Insight
• Oracle Stream Analytics and Stream Explorer (with Apache Spark)
• Pivotal Spring Cloud Data Flow
• Radicalbit Natural Analytics (with Apache Flink, Apache Kafka Streams, Apache Spark)
• SAP Event Stream Processor
• SAS Event Stream Processing Engine
• (Guavus) SQLstream Blaze
• Software AG Apama Streaming Analytics
• Streamlio (with Apache Bookkeepper, Apache Heron, and Apache Pulsar)
• Striim Platform
• TIBCO StreamBase CEP
• Vitria VIA Analytics Platform
• WSO2 Stream Processor

3. Open source streaming frameworks

At least six vendors have put the core of their ESP frameworks into Apache projects to encourage wider adoption and community contributions. The code is available for free from Github/Apache or other channels. The frameworks can be used for either stream analytics or SDI. For example, Spark Streaming is popular for SDI but it is also used for stream analytics and other streaming applications. Spark is the basis for many hybrid products, with the Storm apparently second in popularity (it is especially popular among older hybrid products).
The basic Apache versions have fewer features and require more effort to use than the commercial ESP platforms (including the hybrid versions of these products) listed above, although they are fine for experiments, learning, simple applications, or some advanced applications by expert developers.

Examples

• Apache Flink (from dataArtisans)
• Apache Gearpump (from Intel)
• Apache Heron (from Twitter)
• Apache Kafka (including KStreams, KSQL) (from LinkedIn and Confluent)
• Apache Samza (from LinkedIn)
• Apache Spark Streaming (from Databricks)
• Apache Storm (from Twitter)

There are also two ESP platforms (rather than streaming frameworks) that are available in a free, open-source form: Esper (also available in a supported version from EsperTech) and Drools Fusion (available in a supported version from (IBM) Red Hat). Finally, there is an interesting free open-source stream processing library, IoTPy, that makes standard Python applications and off-the-shelf non-streaming Python analytics into nonterminating stream processing agents, for learning purposes.

4. Stream data integration (SDI) platforms

We are aware of eight ESP platforms that are entirely or primarily designed to support SDI. They tend to have more adapters for databases and file systems than the general-purpose commercial ESP platforms listed in category 2 above. For example, their change data capture (CDC) features are generally better than those in the general purpose platforms (indeed, many of the general purpose platforms have no CDC). The authoring facilities in the SDI platforms are optimized for developing real-time ETL solutions, and some have features for detecting data drift (changes in the profile of the incoming event data). The architectures of these products vary, but in many ways their core frameworks resemble those of the general-purpose ESP platforms because they are dealing with the same kind of (streaming) data. Like the general-purpose platforms, they generally support Kafka or other messaging infrastructure, at least on the input side. However, they typically don’t have strong support for real-time business dashboards, alerting, or managing processes that respond to situations that are detected (sense-and-respond behavior).

Examples

• Alooma Platform
• Astronomer Cloud, Enterprise, Open/Apache Airflow
• Datastreams.io Data Stream Manager
• Equalum LTD Data Beaming platform
• InfoWorks Autonomous Data Engine
• Intel/Apache GearPump
• Nexla Data Operations platform
• Streamsets Data Collector and Data Collector Edge

5. Stream analytics platforms with integrated data stores

This category of product has a fundamentally different architecture than the previous four categories. The incoming event stream data is loaded into a native, internal event store, so the data is at rest in memory or on disk before analytics are performed (the other categories of product above operate on data in motion).

In the past, most discussions of stream analytics ignored this type of product because of the latency that is introduced while storing the data (which generally involves creating some kind of index or hashing the data). For ultra-low-latency applications that can be accomplished in one pass of the data, these are still not the right solution. However, some of these products are surprisingly fast at storing the data and making it available for analytics (a matter of seconds or minutes, and sometimes even subsecond). This is plenty fast for many near-real-time (“business real-time”) business problems, including almost all applications that involve a human in the response. Moreover, these products support applications that require multiple passes of the data, or ad hoc queries and drill down that are generated after the data arrives (such applications not natively supported by data-in-motion systems).

We consider these products to be stream analytics products because they operate on the same kind of event streams that are addressed by the other categories of products. although we might say that they are not performing “streaming” analytics, merely stream analytics.

A few of many examples

• Devo Data Operations Platform (was Logtrust)
• Elastic (Apache) Elasticsearch/Logstash/Kibana
• (First Derivatives) KX KDB+
• Imply.io
• INETCO Analytics
• InfluxData InfluxEnterprise, InfluxCloud
• Interana Platform
• IQLECT platform
• Keen.IO Streams, Compute, Access, Data Explorer
• One Market Data OneTick
• Software AG MashZone NextGen on Big Data
• Splunk Enterprise and Splunk Cloud
• Sumo Logic Free, Professional, Enterprise
• TIBCO Software LiveView on Live Datamart
• Unscrambl Brain
• VMware Wavefront

Bottom line

The usual. One size does not fit all. Every big company will need products from at least three of these categories to use in different departments for various applications. Every big company will have multiple products even from within a category, again because the business requirements will vary so much.

Streaming data is of growing importance to many business scenarios. Understanding the differences among these products will greatly increase the chance of success for your project.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.