Most studies evaluating AI in radiology didn’t validate the results

A new report has found that most research applying machine learning to review medical images never validated the outcomes obtained.

The revelation—contained in a report entitled Design Characteristics of Studies Reporting the Performance of Artificial Intelligence Algorithms for Diagnostic Analysis of Medical Images: Results from Recently Published Papers—casts doubt on whether AI research can be applied to actual patient care.

There has been tremendous interest in using artificial intelligence in radiology, primarily through convolutional neural networks. As a result, there have been thousands of studies on the subject.

However, to be useful clinically, the results need to be validated, or confirmed, using external data that was not part of the algorithm’s training, contends the report, published in KoreaMed Synapse, a journal of the Korean Society of Radiology.

RSNA-CROP.jpg
iSite Radiology Software application

In addition, the external validation needs to include particular design features, such as using adequately sized datasets that are collected either from newly recruited patients or at institutions other than those that provided training data in a way to adequately represent all relevant variations in patient demographics and disease states.

Without external validation, there is a genuine risk that the artificial intelligence algorithms may not perform well in real world practice and will provide inaccurate output when applied to data from an institution other than from where the algorithm was trained on.

The researchers examined whether recent studies on machine learning of imaging had design characteristics to validate the results of the study.

Of 516 studies published between Jan. 1, 2018, and Aug. 17, 2018, only 6 percent of the studies performed any external validation. However, none of those studies adopted all of the recommended design features for robust validation of the clinical performance of the artificial intelligence algorithms.

The study’s authors did note that this doesn’t necessarily mean that the studies were inadequately designed if they were intended simply as a “proof of concept” study to evaluate the technical feasibility of the use of machine learning in radiology.

They also noted that some methodological study design guides were published after many of these studies were conducted, so that design features in future research may be improved.

“As with any other medical devices or technologies, the importance of thorough clinical validation of AI algorithms before their adoption in clinical practice through adequately designed studies to ensure patient benefit and safety while avoiding any inadvertent harms cannot be overstated,” the study authors concluded.

For reprint and licensing requests for this article, click here.