Boston Children’s big data effort provides near real-time influenza tracking

Researchers leverage electronic health records, machine-learning algorithm to spot flu outbreaks.



Using cloud-based data from electronic health records, a Boston Children’s Hospital-led research team has successfully demonstrated that it can detect cases of influenza in near real-time and in the process beat similar reporting by the Centers for Disease Control and Prevention by at least one week.

The results of their study, published online May 11 in the journal Scientific Reports, shows that by leveraging EHR data, historical patterns of flu activity and a machine-learning algorithm to interpret the data, researchers were able to make accurate predictions of national and local influenza activity that matched subsequent CDC reporting.

As the authors of the article point out, CDC monitors influenza-like illnesses (ILI) in the United States by gathering information from physicians’ reports about patients with ILI seeking medical attention—one of the most widely used methods for tracking flu.

While CDC’s ILI data provides useful estimates of influenza activity, the authors note that its availability has a time lag of one to two weeks.
“This time lag is far from optimal since public health decisions need to be made based on information that is two weeks old,” states the article. “Systems capable of providing real-time estimates of influenza activity are, thus, critical.”

Consequently, the researchers argue that combining cloud-based EHRs with machine learning techniques and historical epidemiological information has the potential to accurately and reliably provide near real-time regional estimates of U.S. flu outbreaks.

“Our team has been working on using data sources that are not traditional to track flu,” says Mauricio Santillana, PhD, faculty member at Boston Children’s Computational Health Informatics Program. “We thought that combining these data sources would lead to improved tracking of flu surveillance, and we have shown that’s the case. With this new data source--electronic health records specifically--we have shown that we can get a better sense of flu activity in finer spatial resolutions.”

Researchers tapped cloud-based EHR data from vendor athenahealth’s database compiled from 72,000 healthcare providers and records for more than 23 million patients, while using a flu-prediction algorithm called AutoRegressive Electronic Health Record Support (ARES) to estimate flu activity over a three-year period.

“The beauty of the access we have to the data from athenahealth is that they send it to us weekly. Every Monday we get the data from them for the past week, so that’s very close to real time,” adds Santillana. “The fact that they are a cloud-based EHR is helpful because it is very easy for them to query their system and then send us the data.”

Based on their results, researchers were able to demonstrate that the algorithm’s estimates of national and regional flu activity had error rates that were two-fold to three-fold lower than earlier predictive models. In addition, according to the study, ARES correctly estimated the timing and magnitude of the national flu “peak week” for three seasons—though it was slightly less accurate in predicting regional peak weeks.

“Systems capable of providing real-time estimates of influenza activity are, thus, critical.”The researchers explain in their article that EHR data provides an “early count” of ILI activity in much the same way as exit polls enable pollsters to forecast election results. They also note that one of the limitations of ARES is that it relies on EHR data is not generally available to the public, although athenahealth does provide it to several groups of influenza researchers around the country.

Going forward, the team plans to integrate ARES into a network model and test the accuracy of the algorithm at state and city levels, in countries other than the United States, as well as on communicable diseases other than the flu.

More for you

Loading data for hdm_tax_topic #better-outcomes...