University of Iowa effort sifts through EHR data to aid precision medicine

In genetics, the phenotype of an organism is the composite of the organism’s observable characteristics or traits. But there are so many different millions of characteristics or traits that it’s virtually impossible for a doctor to know what to focus on.


In genetics, the phenotype of an organism is the composite of the organism’s observable characteristics or traits. But there are so many different millions of characteristics or traits that it’s virtually impossible for a doctor to know what to focus on.

At the University of Iowa, Benjamin Darbro, MD, an associate professor of pediatrics at the Stead Family Department of Pediatrics starts with the electronic health record.


“The EHR has a lot of information, and over a course of many years, it can be tough to look at the entirety of data and extract the relevant data,” Darbro explains. “We are geneticists. Our job is to look at the patient and the genetic features they have such as up-slanting of the eyes or down-slanting of the eyes. But kidney and heart defects are tougher to note.”

The hospital performs clinical genomic testing on thousands of patients annually. To further understand the best treatment pathway for a particular patient or group of patients, clinicians often attempt to identify patients’ clinical phenotypes.

Typically, the identification process requires healthcare professionals to manually review massive amounts of EHR data, which is time consuming, inefficient and prone to errors.

To reduce search time and improve the quality of results, University of Iowa Health deployed the natural language processing text-mining software of Linguamatics. The product curates data to enable a clinician to more easily select and organize data and any other desired available information. Early results are impressive.

Using Linguamatics, the average time to curate a single phenotype decreased from 70 minutes with manual searches to one minute using natural language processing technology, and the average number of phenotypes identified per cohort grew from one to three with manual searches to 30 to 40 with natural language processing.

The challenge now is to improve the gathering of information to be able to use all the data necessary within the EHR and train the computer to find in the medical record what information the doctor wants.

“Linguamatics provides tools and a framework, but we still have to teach it,” Darbro says. We can’t just sit at a computer, but we need an easier process without having strong computer science expertise.

“So, we tell Linguamatics to look for certain terms, such as a person of tall statue or one of short statute, or that a patient’s father has a different type of diabetes than the patient, and the different descriptors of the patient’s number of phenotypes compared to the father. If I manually sit at a computer, I could find 25 phenotypes but after training Linguamatics, I can find 130.”

Alyssa Hahn, a doctoral student working with Darbro, emphasizes the grunt work of looking at patient phenotype results to validate accuracy. She looked at results of 30 patients, which took several days because there’s so much data that it can be difficult to know if two findings are similar or the same.

“This work is still early on as precision medicine is new, but phenotypes will play a significant role in precision medicine,” Darbro predicts.

More for you

Loading data for hdm_tax_topic #better-outcomes...