Why big data is key for better infectious disease surveillance, modeling
Although laboratory tests and other data collected by public health institutions have historically been the gold standard for infectious disease surveillance, these traditional methods of collecting information are falling short.
Instead, big data drawn from electronic health records, social media, the Internet and other digital data are emerging as potentially more timely and detailed information sources for helping to combat outbreaks of infectious diseases, say a team of scientists led by the National Institutes of Health.
Writing in a special issue of The Journal of Infectious Diseases, researchers contend that the time has come for public health to finally embrace big data.
“While big data have proven immensely useful in fields such as marketing and earth sciences, public health is still relying on more traditional surveillance systems and awaiting the fruits of a big data revolution,” states an opinion piece. “A new generation of big data surveillance systems is needed to achieve rapid, flexible, and local tracking of infectious diseases, especially for emerging pathogens.”
However, rather than calling for replacing traditional surveillance systems with big data sources, the authors advocate for increased use of hybrid systems which combine the two methods, which they believe is the most promising option moving forward for surveillance and modeling.
“The ultimate goal is to be able to forecast the size, peak or trajectory of an outbreak weeks or months in advance in order to better respond to infectious disease threats. Integrating big data in surveillance is a first step toward this long-term goal,” says Cecile Viboud, co-editor of the 10-article supplement and a senior scientist at the NIH’s Fogarty International Center.
“To be able to produce accurate forecasts, we need better observational data that we just don’t have in infectious diseases,” adds Shweta Bansal, a professor in the Department of Biology at Georgetown University and a co-editor of the supplement. “There’s a magnitude of difference between what we need and what we have, so our hope is that big data will help us fill this gap.”
Despite the potential of digital data sources—such as medical records and claims, crowdsourced data, mobile phone logs, as well as social media and Internet searches—to complement traditional infectious disease surveillance methods, the authors note that these types of big data have their own challenges. In particular, one article addresses the limitations of two major data sources—EHRs and patient-generated data.
“As the two data sources have complementary strengths—high veracity in the data from traditional sources and high velocity and variety in patient-generated data—they can be combined to build more-robust public health systems,” states an article. “However, they also have unique challenges. Patient-generated data in particular are often completely unstructured and highly context dependent, posing essentially a machine-learning challenge.”
Nonetheless, the article points out that some recent examples from infectious disease surveillance and adverse drug event monitoring demonstrate that the technical challenges can be solved. At the same time, its author maintains that the “problem of verification remains and, unless traditional and digital epidemiologic approaches are combined, these data sources will be constrained by their intrinsic limits.”
While there are technical and ethical challenges associated with Internet search logs and social media posts, researchers suggest that they can provide information more quickly than traditional physician-based reporting systems.
Likewise, although insurance claims and social media posts have the potential for filling geographical information gaps, researchers argue that there are still technical, practical and privacy challenges that must be addressed.