NIH to expand genomics catalog for research community

Database and associated tools represent the latest step in the agency’s commitment to make resources available to researchers.

Feb 15 173 min read

Greg Slabodkin

Managing Editor, Health Data Management

The National Institutes of Health is expanding its Encyclopedia of DNA Elements (ENCODE) Project, creating high-quality and easily accessible sets of data, tools and analyses used to interpret genome sequences and to understand the consequence of disease-linked genomic variation.

Launched in 2003, ENCODE is an effort to generate a comprehensive catalog of functional DNA elements within genomes, enabling researchers to study human health and disease. To date, more than 1,700 scientific publications generated by the research community have used ENCODE data or tools.

“We’re transitioning into the fourth phase of ENCODE, where we’re trying to greatly expand the number of cell and tissue types,” says Daniel Gilchrist, Ph.D., program director of computational genomics and data science in the Division of Genome Sciences at NIH’s National Human Genome Research Institute (NHGRI).

As part of the project’s fourth phase, NHGRI is committing $126 million over four years to expand the scope of ENCODE to include characterization centers, which will study the biological role that candidate functional elements may play and develop methods to determine how they contribute to gene regulation in a variety of cell types and model systems. These elements control when a gene is turned on and can have a significant biological impact or, in specific circumstances, contribute to disease.

In addition, the phase four funding will help enhance the ENCODE catalog by developing a way to incorporate data provided by the research community, as well as use human biological samples from research participants who have given their consent for unrestricted sharing of their genomic data.

“This means that data can be deposited in freely accessible databases…and shared without registration or prior approval,” the ENCODE website states.

Also See: NIH Commits $280M to Study Genomic Disease Links

Currently, ENCODE’s data and tools are organized into two groups—a data coordinating center at Stanford University, which houses the data and provides the resource through an open-access portal; and a data analysis center at the University of Massachusetts Medical School, which synthesizes the data into an encyclopedia for use by researchers.

“We like to promote open access data sharing. We think that this makes the data of greatest use to the scientific community,” says Gilchrist. “Anybody who has a web browser can get to the data. It’s released as soon as it is processed and passes quality metrics.”

For those researchers who use ENCODE datasets, “we only request that they reference it as such,” he adds.

More for you

Loading data for hdm_tax_topic #better-outcomes...