AI, EHRs uncover phenotypes shedding new light on heart disease

Vanderbilt University Medical Center researchers have identified 14 distinct cardiovascular disease patient subtypes using unsupervised machine learning on unlabeled electronic health record data.

In a study, published in the Journal of Biomedical Informatics, VUMC researchers utilized a method of automated patient phenotyping for more than 12,000 de-identified patient records going back at least 10 years before a heart disease diagnosis.

VUMC-CROP.jpg
Vanderbilt Medical Center Campus photos, summer 2014 ( Daniel Dubois / Vanderbilt University)

Overall, an automated scan found some 1,068 distinct patient phenotypes in the dataset.

“In this study, we applied a constrained non-negative tensor-factorization approach to characterize the complexity of cardiovascular disease (CVD) patient cohort based on longitudinal EHR data,” state the authors. “Through tensor-factorization, we identified a set of phenotypic topics (i.e., subphenotypes) that these patients established over the 10 years prior to the diagnosis of CVD, and showed the progress pattern.”

For each identified subphenotype, investigators examined its association with a conventional CVD-risk assessment tool often used in clinical practice—the American College of Cardiology-American Heart Association Pooled Cohort Risk Equations.

Researchers discovered some patient phenotypes—such as depression, urinary infections and Vitamin D deficiency—that they contend “cannot be explained by the conventional risk factors” and “would appear to challenge current understanding of the routes by which CVD emerges.”

In addition, researchers found markedly different risks of subsequent myocardial infarction rates among the six most prevalent subphenotypes using survival analysis, which they believe indicates “clinically meaningful” CVD distinctions among the study’s patient population. Specifically, a subphenotype for “hypertension” with few “hyperlipidemias” increased the subsequent MI risk.

“This study demonstrates the potential benefits of using tensor-decomposition to model diseases as dynamic processes from longitudinal EHR data,” conclude the authors. “Our results suggest that this data-driven approach may potentially help researchers identify complex and chronic disease subphenotypes in precision medicine research.”

The study was supported in part by the National Institutes of Health and the American Heart Association.

For reprint and licensing requests for this article, click here.