Tom Groenfeldt, Forbes
Finance knows lots of data, from transactions to years of stock market tick data. But Big Data is something more.
“Big data isn’t just about large volumes of data, it is about diverse data from web logs, sensor networks, social networks, call detail records and more,” said Scott Gnau, head of research and development at Teradata. “We define the big data space as an emerging data type and diverse analytics; it is still an emerging market.”
Tony Baer, principal analyst of enterprise software at Ovum, a research firm, cites a couple of examples. Yahoo uses up to 170 petabytes of data in Hadoop, an open source software framework for very large data sets, to customize the home pages it presents to users. That improved visits click-throughs by 160 percent, he said, quoting a study by Hortonworks CEO Eric Baldeschwieler. Baer also said that eBay and Teradata have designed a 37 petabyte store of structured and unstructured data.
In finance, the leading edge in dealing with large amounts of data has often been complex events processing (CEP), analyzing and acting on data in real-time. (See my stories below on StreamBase and Redkite) That works with high-speed, highly standardized market or transaction data, not so well when it comes to bringing together disparate data types over time or recognizing a new financial activity that could be fraudulent.
“Complex event processing (CEP) engines are fine if you’re content with rules,” said Gnau. “If you know certain discrete elements that are fraud, CEP can detect them as they come through and prevent it. But because it is based on rules built from experience, it won’t protect against a new type of fraud.
He offered some examples from outside of finance. A jet engine’s sensors throw off terabytes of data every hour, data which can be used to build predictive models for repair cycles. Understanding when repairs should be done, instead of doing traditional preventive maintenance at certain set intervals, could be worth billions of dollars.
DCL: interesting report. But with misunderstandings about CEP. New rules can be added at any time to a CEP system. Discovering the “right” rules is a problem in any field of endeavor.