NIH releases large chest X-ray dataset to researchers

Register now

The National Institutes of Health has released one of the largest publicly available chest X-ray datasets to the scientific community, including more than 100,000 anonymized images of scans from more than 30,000 patients—many who have advanced lung disease.

Before it was released, the dataset was rigorously screened to remove all personally identifiable information, according to NIH. The images come from patients at the NIH Clinical Center, a research hospital, who voluntarily enrolled to participate in clinical trials.

The agency is hoping that by releasing the de-identified dataset to researchers that they will be able to teach computers how to read and process extremely large numbers of scans, thereby confirming results radiologists have found and potentially detecting and diagnosing other findings that may have been overlooked.

Also See: Penn Medicine using predictive model to anticipate ER visits by lung cancer patients

“Reading and diagnosing chest X-ray images may be a relatively simple task for radiologists but, in fact, it is a complex reasoning problem which often requires careful observation and knowledge of anatomical principles, physiology and pathology,” states NIH. “Such factors increase the difficulty of developing a consistent and automated technique for reading chest X-ray images while simultaneously considering all common thoracic diseases.”

The agency sees artificial intelligence as a potential solution that can help clinicians make better diagnostic decisions. In particular, NIH envisions that this advanced computer technology could help to identify slow changes occurring over the course of multiple chest X-rays that might otherwise be overlooked, as well as creating a “virtual radiology resident” that can later be taught to read more complex images like CT and MRI.

In coming months, the NIH Clinical Center—the nation’s largest hospital devoted entirely to clinical research—expects to also make a large dataset of CT scans publicly available.

For reprint and licensing requests for this article, click here.