A new study concludes that revising the inner plumbing of the Google Flu Trends disease surveillance system can improve the accuracy of forecasts for the severity of a flu season.
During the 2012/2013 flu season, GFT predicted 10.6 percent of the population had influenza-like illness when only 6.1 percent did according to patient records. The study, published in the American Journal of Preventive Medicine, notes that GFT predictions during other flu seasons, particularly the H1N1 pandemic in 2009, also were off-base.
However, existing shortfalls in the predictions result from the methodologies that GFT employs, not from the data themselves, according to the study. Simply put, the methodologies were too simple. By using GFTs data but tweaking the methodologies, researchers at San Diego State University, Harvard University and the Santa Fe Institute reduced the 2012/2013 prediction from 10.6 percent to an estimated 7.7 percent.
The new model tackles three problems of GFT methodology. Combining multiple queries into a single variable ignores the variability in individual search query tendencies over time and how certain unique queries may be better predictors. Exclusion of search queries relies on investigator opinion rather than empirical evidence. And, the model is static. The language of searches undoubtedly changes over time (e.g., swine flu, H1N1, H1N9), and must be accounted for in any prediction model, researchers contend.
Our alternative approach, inspired by data-assimilation techniques, supervised machine learning and artificial intelligence, expands upon (1) their single explanatory-variable approach by allowing multiple individual queries to contribute independently to the prediction; (2) their quasi-nonempirical search query selection, by empirically choosing search queries that maximize predictive accuracy in real time; and (3) their use of manual revisions, by dynamically updating how individual queries predict influenza each week to ensure accurate prediction across a changing search and influenza landscape. All improve the transparency of GFT.
Study authors emphasize that they remain optimistic about the future of GFT and digital disease detection because a methodologic problem has a methodologic solution. They worry, however, that while Google recently acknowledged that a multivariable approach would improve accuracy, other weaknesses in GFT methods remain.
Therefore, our alternative method may serve as the foundation for another revision to GFT, they write. Specifically, because much of the alternative is automated, it can be scaled up (e.g., Google could apply it to thousands of strongly correlated search terms instead of just 100 as herein). Our study is just an initial step toward improving GFT, as the structure around our model can be further refined to yield even greater accuracy. Moreover, by making the inner workings of GFT and the data behind GFT more public, such improvements may be more quickly realized by other external teams.
The study is available here.
Register or login for access to this item and much more
All Health Data Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access