Twitter helps track spread of seasonal flu in real time

Tweets, combined with parameter-driven model, tell when virus will spike, says Alessandro Vespignani.

May 23 173 min read

Managing Editor, Health Data Management

An international team led by researchers at Northeastern University is leveraging a computational model to accurately predict the spread of the seasonal flu in real time using posts on Twitter.

Researchers say that tweets, in combination with key parameters of each season’s flu epidemic, can reliably identify when the virus will reach its peak, as well as the number of cases, as much as six weeks in advance of the actual event.

In particular, they contend that real-time forecasts of flu peak time and intensity can provide vital information for public health interventions, such as resources allocation for prevention, control and communication.

According to Alessandro Vespignani, professor of physics and director of the Network Science Institute at Northeastern University, his team’s forecasting methodology is able to make projections significantly earlier than other models. What makes their parameter-driven model stand out is Twitter.

The Global Epidemic and Mobility (GLEaM) model, which is used to simulate the spread of flu on a worldwide scale by factoring in key parameters of each season’s epidemic—including the incubation period of the disease, the immunization rate, and the viral strains present—is now being informed by Twitter data. In this case, location data gathered from more than 50 million tweets containing keywords related to flu symptoms.

Also See: Boston Children’s big data effort provides near real-time influenza tracking

“For the flu season, the most puzzling thing is the initial conditions” such as where and when an epidemic began as well as the extent of infection, says Vespignani. “Right before many other outbreaks, you can say the origin of the epidemic was here or there. But, with the seasonal flu, this is not possible.”

To help compensate for that knowledge gap, researchers monitor Twitter activity related to flu in different parts of the United States, which allows them to initialize the parameter-driven model.

“This kind of integration has never been done before,” adds Vespignani. “We were not looking for the number of people who were sick because Twitter will not tell you that. What we wanted to know was: Do we have more flu at this point in time in Texas or in New Jersey, in Seattle or in San Francisco? Twitter, which includes GPS locations, is a proxy for that. By looking at how many people were tweeting about their symptoms or how miserable they were because of the flu, we were able to get a relative weight in each of those areas of the U.S.”

Other online sources of data have been used in the past to estimate flu activity in near real-time. Case in point is Google Flu Trends (GFT), which was launched in 2008 and leveraged aggregated web search data. However, in August 2015, GFT was discontinued after the service significantly overestimated the severity of influenza outbreaks compared to the Centers for Disease Control and Prevention’s reported U.S. flu levels.

More for you

Loading data for hdm_tax_topic #better-outcomes...