Leading genomic databases lack racial diversity

Two of the top genomic databases widely used by clinical geneticists reflect a measurable bias toward genetic data based on European ancestry over that of African ancestry—a racial bias that has negative research implications.

That’s the finding of a new study published October 11 in the journal Nature Communications. Researchers examined the ClinVar and Human Gene Mutation Database, and in the process, showed a clear preference in those databases for European genetic variants over non-European variants.

“Any ancestry-related biases that exist when using typical filters and databases to implement variant prioritization and other similar precision genomic medicine techniques can have profound confounding effects, as most methodological biases do,” state researchers in the article.

Broader genetic representation is important for researchers and those using various forms of analytics to better make forecasts for population health initiatives, as well as improving the scope of information use in initiatives such as personalized medicine.

Timothy O’Connor, assistant professor at the University of Maryland School of Medicine and a faculty member of the school’s Institute for Genomic Sciences, observes that individuals of African ancestry have historically been underrepresented in these kinds of databases.

“One of the hopes that I have for this paper is that it will highlight for the scientific community the need to improve these databases and better represent genetic variation,” he says. “We need to make an exerted effort to include these other populations.”

Also See: Protecting the privacy of genomic databases by mixing in ‘noise’

Developed by NIH’s National Center for Biotechnology Information, the ClinVar repository is a freely accessible, public archive of reports on the relationships among human variations and phenotypes, with supporting evidence. For its part, the Human Gene Mutation Database (HGMD)—maintained by the Institute of Medical Genetics at Cardiff University in Wales—is “an attempt to collate known (published) gene lesions responsible for human inherited disease.”

O’Connor, who led the study, adds that the “powers that be have figured this out” and are starting to address the databases’ racial disparities.

“On the basis of annotations from HGMD, individuals with predominantly African ancestry have by far the most variants considered disease causing, whereas variants prioritized as disease causing based on annotations from ClinVar are most abundant in individuals with predominantly European ancestry and are of intermediate to below-average abundance in predominantly African-ancestry individuals,” states the article. “These population-based discrepancies reflect differences between databases, and suggest that the interplay between database and sample ancestry is important.”

O’Connor is quick to note that ClinVar and HGMD “do incredible work trying to aggregate the data.” Nonetheless, these genomic databases are not alone in showing racial bias. According to another study, 96 percent of individuals included in recent genome-wide association studies are of European descent, driving home the importance of providing race/ethnic balance in biomedical research.

“This groundbreaking research by Dr. O’Connor and his team clearly underscores the need for greater diversity in today’s genomic databases,” says University of Maryland School of Medicine Dean E. Albert Reece, MD. “By applying the genetic ancestry data of all major racial backgrounds, we can perform more precise and cost-effective clinical diagnoses that benefit patients and physicians alike.”

NIH is currently ramping up its Precision Medicine Initiative national research cohort of 1 million or more Americans who will volunteer to share with researchers their biological, environmental, lifestyle and behavioral information, as well as tissue samples.

The PMI cohort aims to be one of the world’s largest and most diverse datasets for precision medicine research. NIH says the landmark longitudinal research study will be racially and ethnically diverse in terms of participants, providing “unprecedented opportunities to examine the complex relationship of ancestral influences, environmental exposures, and social factors.”

According to NIH, the cohort will be designed to ensure that people historically underrepresented in biomedical research are included in sufficient numbers.

“NIH is leading this effort to try and include other populations because historically the criticism about NIH is that they tend to focus on scientific reductionism,” concludes O’Connor. “Personalized medicine should work for everyone, not just people of European descent.”

For reprint and licensing requests for this article, click here.