NCI launches open access resource to spur cancer research

The National Cancer Institute on Monday launched a Genomic Data Common designed to promote the sharing of genomic and clinical information between researchers as part of Vice President Joe Biden’s National Cancer Moonshot Initiative.

The Genomic Data Common (GDC) will centralize, standardize and make data available to scientists from some of the world’s largest and most comprehensive cancer genomics datasets, including more than two petabytes of data contained in The Cancer Genome Atlas and its pediatric equivalent, Therapeutically Applicable Research to Generate Effective Treatments.

The online repository will enable researchers to integrate genetic and clinical data, such as cancer imaging and histological data, with information on the molecular profiles of tumors as well as treatment response.

“This is good news in the fight against cancer. With the launch of this new national resource, anyone can freely access raw genomic and clinical data for 12,000 patients—with more records to follow,” said Biden. “Increasing the pool of researchers who can access data and decreasing the time it takes for them to review and find new patterns in that data is critical to speeding up development of lifesaving treatments for patients.”


The National Cancer Moonshot, led by Biden whose son Beau died of cancer last year, is aimed at making more therapies available to more patients, while also improving the nation’s ability to prevent cancer and detect it in an early stage. The goal of the initiative is to make a decade’s worth of progress in five years in the prevention, diagnosis, and treatment of cancer.

“We’re excited about the GDC because we think it’s a real engine for precision medicine, getting the right drug to the right patient at the right time,” said Louis Staudt, MD, director of NCI’s Center for Cancer Genomics. “In cancer, that means the analysis of the cancer genome. What the GDC is, then, is a way to share data about the abnormalities in the cancer genome of patients as well as the clinical data from those patients, such as how they responded to particular drugs and for how long.”

According to Staudt, GDC’s data represents thousands of cancer cases that will be harmonized using standardized software algorithms so that they are accessible to any cancer researcher around the globe. He said researchers worldwide will be able to use the state-of-the-art analytic methods of the GDC, enabling them to compare their findings with other data in the GDC. In addition, the GDC will accept submissions of cancer genomic and clinical data from researchers globally who wish to share their data, serving as a comprehensive knowledge system for cancer.

“This is not just open for our data, from the National Cancer Institute, but it’s open for anyone’s data. Any cancer researcher in the world will be able to upload their cancer genomic data and clinical data, taking advantage of the software that we’ve built to analyze these genomes and compare their data to all the other data within the GDC,” said Staudt.

Vice President Biden was on hand for Monday’s launch of the GDC at the University of Chicago’s Center for Data Intensive Science, which is managing the platform in collaboration with the Ontario Institute for Cancer Research in Toronto, Canada, under an NCI contract with Leidos Biomedical Research in Frederick, Md.

“The GDC itself allows us to take very detailed genomic and clinical data coming from individual patients that have signed consents and put them in a computable environment that researchers throughout the world can have access to,” said Warren Kibbe, director of NCI’s Center for Bioinformatics and Information Technology. “Everyone can see and learn from those data.”

In the past, the sheer size of the quantity of data involved in this type of research has been a technological barrier, Staudt said. “To download all of the data from The Cancer Genome Atlas would take three weeks of continuous download, would require about $1 million of computer hardware, and would require a team of software engineers to maintain this in a properly protected way that preserves the privacy of patients,” he said. “We do all that work for researchers…who don’t have to download it onto their hard drives.”

“The model here is this data is public,” concluded Robert Grossman, director of the Center for Data Intensive Science at the University of Chicago. “It’s both open and controlled access, so you may need to fill out the right forms to see it. But, the data’s public and available to anyone independently of whether they’ve contributed data to the GDC or not.”

For reprint and licensing requests for this article, click here.