Researchers Suggest Fixes to Google Flu Trends Analytics

A new study concludes that revising the inner plumbing of the Google Flu Trends disease surveillance system can improve the accuracy of forecasts for the severity of a flu season.

Jul 07 143 min read

Joseph Goedert

Senior Editor, Health Data Management

A new study concludes that revising the inner plumbing of the Google Flu Trends disease surveillance system can improve the accuracy of forecasts for the severity of a flu season.

During the 2012/2013 flu season, GFT predicted 10.6 percent of the population had influenza-like illness when only 6.1 percent did according to patient records. The study, published in the American Journal of Preventive Medicine, notes that GFT predictions during other flu seasons, particularly the H1N1 pandemic in 2009, also were off-base.

However, existing shortfalls in the predictions result from the methodologies that GFT employs, not from the data themselves, according to the study. Simply put, the methodologies were too simple. By using GFTs data but tweaking the methodologies, researchers at San Diego State University, Harvard University and the Santa Fe Institute reduced the 2012/2013 prediction from 10.6 percent to an estimated 7.7 percent.

The new model tackles three problems of GFT methodology. Combining multiple queries into a single variable ignores the variability in individual search query tendencies over time and how certain unique queries may be better predictors. Exclusion of search queries relies on investigator opinion rather than empirical evidence. And, the model is static. The language of searches undoubtedly changes over time (e.g., swine flu, H1N1, H1N9), and must be accounted for in any prediction model, researchers contend.

Our alternative approach, inspired by data-assimilation techniques, supervised machine learning and artificial intelligence, expands upon (1) their single explanatory-variable approach by allowing multiple individual queries to contribute independently to the prediction; (2) their quasi-nonempirical search query selection, by empirically choosing search queries that maximize predictive accuracy in real time; and (3) their use of manual revisions, by dynamically updating how individual queries predict influenza each week to ensure accurate prediction across a changing search and influenza landscape. All improve the transparency of GFT.

Study authors emphasize that they remain optimistic about the future of GFT and digital disease detection because a methodologic problem has a methodologic solution. They worry, however, that while Google recently acknowledged that a multivariable approach would improve accuracy, other weaknesses in GFT methods remain.

Therefore, our alternative method may serve as the foundation for another revision to GFT, they write. Specifically, because much of the alternative is automated, it can be scaled up (e.g., Google could apply it to thousands of strongly correlated search terms instead of just 100 as herein). Our study is just an initial step toward improving GFT, as the structure around our model can be further refined to yield even greater accuracy. Moreover, by making the inner workings of GFT and the data behind GFT more public, such improvements may be more quickly realized by other external teams.

The study is available here.

More for you

Loading data for hdm_tax_topic #better-outcomes...