When Google got flu wrong

by Declan Butler,   Nature

When influenza hit early and hard in the United States this year, it quietly claimed an unacknowledged victim: one of the cutting-edge techniques being used to monitor the outbreak. A comparison with traditional surveillance data showed that Google Flu Trends, which estimates prevalence from flu-related Internet searches, had drastically overestimated peak flu levels. The glitch is no more than a temporary setback for a promising strategy, experts say, and Google is sure to refine its algorithms.  ……………….

But as flu-tracking techniques based on mining of web data and on social media proliferate, the episode is a reminder that they will complement, but not substitute for, traditional epidemiological surveillance networks.  ……………….

The latest U.S. flu season appears to have flummoxed the Google Flu Trends data-mining algorithms, as evidenced by wide disparities between its estimates and those reported by the U.S. Centers for Disease Control and Prevention (CDC). Several researchers think widespread media coverage of the flu outbreak may lie at the heart of the algorithms’ difficulties by triggering many flu-related Web searches by healthy people. Despite these problems, many feel Google Flu will recover its accuracy following the refinement of its models.

“You need to be constantly adapting these models, they don’t work in a vacuum,” says Harvard Medical School’s John Brownstein. “You need to recalibrate them every year.”


Meanwhile, several projects are underway to calculate flu outbreaks by crowdsourcing via citizen volunteers. Lyn Finelli with CDC’s Influenza Surveillance and Outbreak Response Team sees great potential in such efforts, particularly because the questionnaires are based on clinical definitions of influenza-like illness (ILI) and so generate very clean data.  ………………

Some research groups also have published work suggesting that a close match can be made between official ILI data and models derived from analysis of flu-related Twitter messages. Article

DCL: Disease monitoring systems are slowly, very slowly, introducing CEP techniques without knowing exactly what they are doing. These ‘Flu systems are but one example.

Leave a Reply