Data augmentation enables AI to find breast cancer early

Technique increases the available data set, enabling deep learning to be applied, says Marcelo Andrade da Costa Vieira.


Increasing the amount of breast imaging data available to train deep learning algorithms may lead to earlier identifications of “architectural distortions,” which are tissue anomalies that are often the earliest manifestation of breast cancer.

Breast cancer is the deadliest cancer affecting women worldwide. Early treatment is an important way to combat the disease, and the most used exam to do so is digital mammography, where the radiologist looks for masses, microcalcifications and architectural distortions, identifiable as an asymmetric area of the breast tissue caused by subtle contracting.

Almost 40 million mammograms are performed annually in the United States alone. However, reading all of these images can be expensive and time consuming. In addition, they are also subject to human error. For instance, architectural distortions can be benign, such as the site of a prior biopsy, but also can be the earliest indication of breast cancer, appearing as much as two years before any other anomaly.

Also See: Deep learning technique outshines AI in detecting glaucoma progression

These distortions are hard to detect visually and often go unnoticed by radiologists. It is the most common finding in retrospective cases of false negative diagnosis of breast cancer.



Computer-aided diagnosis (CAD) can greatly help with the reading of mammograms. Deep learning in general has shown to be more effective than traditional CAD systems in reading images, according to new research reported in arXiv.org, part of the Cornell University Library.

The researchers, from the University of São Paulo in Brazil proposed using deep learning supported by a convoluted neural network to aid in the diagnosis of architectural distortions.

“There are several computer-aided detection systems to aid [in] the detection of architectural distortion, but as far as I know, none of them uses deep learning,” says Marcelo Andrade da Costa Vieira, PhD and associate professor in the Department of Electrical and Computer Engineering at the University of São Paulo and the lead investigator of the study.

However, the main problem in using deep learning software is the need of a large data set to properly train the neural network. This is compounded when dealing with the identification of architectural distortion, because there is a limited data set of images containing these anomalies.

“That’s why we proposed the use of data augmentation [to improve the network’s training],” says Vieira.

The researchers used an initial data set of 300 mammograms that contained identified architectural distortions. They then performed data augmentation by manually cropping the images into “regions of interest” and transforming the images, such as flipping them and rotating them to different degrees. They ended up with a new data set of 21,600 images, of which 70 percent were used to train the software.

The trained network then was tested to detect architectural distortions and obtained an “almost perfect” accuracy of 99.4 percent. To better approximate a clinical setting, the network then was tested using nine new exams for validation and was 86.1 percent accurate in identifying the anomalies.

The researchers are now working on applying the same data augmentation technology to more advanced digital breast tomosynthesis images, which produces three-dimensional images. They will also enhance the technique further by automatically cropping the images to be used for data augmentation in the future.

Detecting these distortions “is very important for early diagnosis of breast cancer,” the study authors say.

More for you

Loading data for hdm_tax_topic #better-outcomes...