Researchers Working to Convert EHR Data to Phenotypes

As part of a four-year, $2.1 million multi-institution project funded by the National Science Foundation, data analytic teams from Georgia Tech and the University of Texas at Austin will develop algorithms and methods to convert EHR data into meaningful phenotypes focused on diseases and specific health traits.


As part of a four-year, $2.1 million multi-institution project funded by the National Science Foundation, data analytic teams from Georgia Tech and the University of Texas at Austin will develop algorithms and methods to convert EHR data into meaningful phenotypes focused on diseases and specific health traits.

For its part, Vanderbilt University will provide initial EHR data and phenotype validation. Resulting phenotypes will be refined and adapted in conjunction with data from Northwestern University so that the information and data can be used across multiple health institutions.

Past efforts to create phenotypes from data tended to be costly and time-intensive. Several challenges face physicians and researchers in developing scalable phenotype methods. These include accurate patient representations, working with data across multiple dimensions, sufficient expert refinement and adaptability across multiple health institutions.

“Traditionally it takes six to 18 months to develop an algorithm for a single phenotype, which is too long,” said co-investigator Joshua Denny, M.D., associate professor of biomedical informatics at Vanderbilt. “There is also a tremendous need for developing high-throughput phenotyping methods that can directly model the interactions among heterogeneous information sources.”

The project will focus on three specific applications, including a system to accurately and effectively identify patients, even with multiple symptoms and health traits, for clinical research and developing predictive models for health studies.

The project can also provide effective phenotypes for genomic-wide association studies. At present, health researchers can only work with one phenotype at a time. But this project will enable researches to quickly study multiple phenotypes jointly. Finally, those identified phenotypes can help analyze specific risk about patients, such as key health factors, exhibited by Type 2 diabetes patients.

In addition to developing the algorithms and methods, the professors will try to develop new health analytics curricula as a massive open online course and for tutorial sessions at conferences.

More for you

Loading data for hdm_tax_topic #better-outcomes...