Machine learning, EHR data helping to combat hospital infections

Mass General, MIT and Michigan Medicine develop risk-prediction algorithms that can be tailored to specific institutions.


Hospitals continue to grapple with clostridium difficile infections, caused by bacteria that are resistant to many common antibiotics and that kill about 30,000 Americans each year. However, machine learning can help predict patient risk in developing C. difficile much earlier than current methods of diagnosis.

Using electronic health records for nearly 257,000 patients, researchers from Massachusetts General Hospital, MIT and Michigan Medicine have built hospital-tailored machine learning models that they contend are an improvement over a “one-size-fits-all” approach that ignores important factors specific to medical facilities.

“When data are simply pooled into a one-size-fits-all model, institutional differences in patient populations, hospital layouts, testing and treatment protocols, or even in the way staff interact with the EHR can lead to differences in the underlying data distributions and ultimately to poor performance of such a model,” says Jenna Wiens, assistant professor of computer science and engineering at U-M. “To mitigate these issues, we take a hospital-specific approach, training a model tailored to each institution.”

De-identified EHR data from 191,014 adult admissions to Michigan Medicine and 65,718 adult admissions to MGH were analyzed using separate machine learning algorithms tailored to each healthcare institution with different types of variables.

“These hospital-specific models allow for earlier and more accurate identification of high-risk patients and better targeting of infection prevention strategies,” conclude the authors in an article appearing in the April issue of the journal Infection Control and Hospital Epidemiology.

Also See: UCSF leverages EHR data to track hospital infections

“Previous risk prediction really relied on an a priori set of ideas about what is driving risk—the machine-learning approach flips that around regarding the risk factors and takes all of the EHR data into consideration to let the model tell us what those risk factors are,” says Erica Shenoy, MD, co-senior author of the study, associate chief of MGH’s Infection Control Unit and assistant professor of medicine at Harvard Medical School.

Shenoy contends that this approach is data-driven given that it is not restricted to the historically known risk factors for C. difficile. As a result, she believes researchers can “look at what the model tells us and then generate hypotheses” about the predictive factors rather than having preconceived hypotheses.

“It’s not telling us causation, but it can certainly direct us toward variables that previously might not have been considered a risk factor,” adds Shenoy. “If you look at the paper, we had different elements with different types of variables. One of the starkest examples is that the University of Michigan had structured vital signs, which was not available to MGH at the time and so it was not a part of our model.”

While other models may consider predicting C. difficile at one point in time, Shenoy makes the point that the machine learning models her team created are able to estimate daily patient risk for infection, which can change over the course of hospitalization because of exposures and susceptibility.

“What we did was predict risk each day of hospitalization and we found that risk varies over time,” she notes. “That’s a novelty of this model and really corresponds with what we think of clinically in terms of how risk evolves—even during an average five-day hospitalization. You can’t really do this without an EHR.”

The EHR data leveraged for the study included individual patient demographics, medical histories, details of their admission and daily hospitalization, as well as the likelihood of exposure to C. difficile.

“This is a really exciting time because so many institutions have migrated to EHR data, and that opens up the possibility of looking not just at C. diff but all sorts of infectious diseases we focus on amongst hospitalized patients, to predict who’s at risk and how that risk might evolve over the course of a hospitalization, as well as help identify transmission that might occur in the hospital,” adds Shenoy.

“This represents a potentially significant advance in our ability to identify and ultimately act to prevent infection with C. difficile,” says study co-author Vincent Young, MD, the William Henry Fitzbutler Professor in the Department of Internal Medicine at U-M. “The ability to identify patients at greatest risk could allow us to focus expensive and potentially limited prevention methods on those who would gain the greatest potential benefit.”

“It’s a retrospective study and our next step is to take this prospectively,” concludes Shenoy. “Our goal here at the hospital would be to identify patients with C diff earlier so we can treat them for their infection as well as institute appropriate isolation and infection control measures to decrease transmission. It requires getting daily extract of all the EHR for every single patient in the hospital—which, we’re in the process of doing now.”

Researchers have made the algorithm code available for free to other hospitals so they can review and modify it to their respective institutions. However, Shenoy emphasizes that that medical facilities that want to use similar algorithms need to put together the “right team” of subject matter experts.

“This is not something that you just plug into your EHR and think it’s going to deliver a result that is valid,” she adds. “Each institution has to validate (the performance of the models in their hospitals) and to do that they need the right people in the room—epidemiologists, computer scientists, clinicians, information system specialists—to modify the base code to fit their own institution.”

More for you

Loading data for hdm_tax_topic #better-outcomes...