Solving Big Data’s ‘Fusion’ Problem

by Matthew Chin,  UCLA Newsroom

As the field of “big data” has emerged as a tool for solving all sorts of scientific and societal questions, one of the main challenges that remains is whether, and how, multiple sets of data from various sources could be combined to determine cause-and-effect relationships in new and untested situations.

Now, computer scientists from UCLA and Purdue University have devised a theoretical solution to that problem.Researchers from the University of California, Los Angeles (UCLA) and Purdue University have developed a theoretical solution to the problem of combining multiple big datasets from various sources to determine cause-and-effect relationships in new and untested scenarios.

Their research, which was published this month in the Proceedings of the National Academy of Sciences, could help improve scientists’ ability to understand health care, economics, the environment and other areas of study, and to glean much more pertinent insight from data.

UCLA professor Judea Pearl, who received the 2011 ACM A.M. Turing Award, and Purdue professor Elias Bareinboim say the conventional strategy of using statistical methods to average out differences among the various sets of information blur these distinctions instead of leveraging them for more insightful analyses.

“It’s like testing apples and oranges to guess the properties of bananas,” Pearl says. “How can someone apply insights from multiple sets of data, to figure out cause-and-effect relationships in a completely new situation?”

Pearl and Bareinboim’s structural causal model chooses how information from one source should be fused with data from other sources, enabling researchers to establish traits of yet another source. Structural causal models outline similarities and distinctions between the sources and process them using causal calculus. An analysis also determines whether the findings from a given study can be generalized to apply to other conditions.

For example, Pearl and Bareinboim’s technique will enable medical researchers conducting a clinical trial to predict the effects of a treatment administered to an intended real-world population.  Read the report.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.