Opinion: AI efficacy in healthcare still needs to be proven in clinical studies

Register now

There’s significant hype surrounding the use of artificial intelligence and its potential use in medicine.

Hardly a day passes without an announcement about AI software that can read medical imagery better than radiologists, arrive at diagnoses from a description of symptoms better than doctors or scrutinize hospital schedules to slash waiting-room times.

This, in turn, has people speculating about the future of human doctors—venture capitalist Vinod Khosla went so far as to claim radiologists would be obsolete by 2022.

Well, this past week, it was DeepMind’s turn. The AI company, owned by Google parent Alphabet, is best known for AlphaGo, the software that beat the world’s top humans at Go. DeepMind, which has a whole division devoted to healthcare, published a paper with NHS researchers contending that the software they had developed can diagnose 50 sight-threatening eye conditions as well, or better than experts.

DeepMind and its NHS partners seem to have gone out of their way to ensure the science was done right—it was published in a peer-reviewed journal, used carefully curated and cleaned data used to train the software, and built a system that let doctors gain insights into how the software made its assessments. They're planning full clinical trials before moving a product to market.

Unfortunately, the same cannot be said of many of the claims being made about AI in medicine, much less some of the AI-enabled healthcare products already on the market. In fact, to date, despite all the hype, there have only been 14 peer-reviewed publications involving computer-vision software interpreting medical imagery, according to a running tally maintained by the cardiologist and medical writer Eric Topol. And so far, there's been only one peer-reviewed study of a prospective trial, a study using AI to spot small polyps in colonoscopy images in real-time.

But, as Topol notes, this hasn't stopped the Food and Drug Administration from approving AI-enabled products. It has approved 13 so far, most of the developers of which haven’t published peer-reviewed research on their software’s performance.

While the FDA does have a rigorous approval process, the fact outside medical researchers and the wider public haven’t been able to scrutinize the science behind these products could be worrisome. Troubling questions have been raised about some highly touted AI medical products.

Last month, medical website Stat reported it had obtained internal documents showing that IBM’s Watson for Oncology product sometimes provided “unsafe and incorrect” treatment recommendations.

London-based digital health company Babylon, to take another example, has made bold claims about its AI-driven chatbot, saying the symptom checker can outperform the average general practitioner on questions designed to test diagnostic ability. (The company says its bot provides “health information” not “diagnoses,” because it is not yet regulator-approved as a medical device.) But a doctor complained to the U.K. medical product regulator that the bot failed to pick up on symptoms associated with a heart attack. Babylon said it has answered the agency’s questions and isn't currently under investigation.

There are big potential pitfalls with using AI to analyze medical imagery as well. John Zech, a doctor at the California Pacific Medical Center, recently wrote that he and other researchers had discovered computer vision software trained to spot an enlarged heart or pneumonia on chest X-rays might actually just be homing in on other data in the image. For instance, the AI paid a lot of attention to lettering on the X-rays indicating whether it had been taken with a portable X-ray machine or a full-size one. That’s because portable X-rays are more likely to be used with sicker patients.

Marks such as this are present on many of the images in the sets of anonymized chest X-rays released by the U.S. National Institutes of Health to help train AI systems. In fact, Zech isn't the only doctor to raise issues with the quality of the NIH data. Other researchers have found problems with how many of the images are labeled.

It’s possible that artificial intelligence may one day transform medicine. That day may even be dawning now. But, for the moment, the best prescription when reading about AI beating human doctors is probably this: take once daily with a giant pinch of salt.

Bloomberg News