Data challenges hinder machine learning in drug development

While machine learning has the potential to identify new treatments, reduce failure rates in clinical trials and improve drug development, a lack of high-quality data is hindering the use of the technology.


While machine learning has the potential to identify new treatments, reduce failure rates in clinical trials and improve drug development, a lack of high-quality data is hindering the use of the technology.

That’s among the findings of a new report jointly published by the Government Accountability Office and the National Academy of Medicine.

“Machine learning is used throughout the drug development process and could increase its efficiency and effectiveness, decreasing the time and cost required to bring new drugs to market,” states the report, which notes that it can take 10 to 15 years to develop a new drug and bring it to market.


Although the best opportunities for machine learning are in particularly data-rich aspects of the drug development process, a major challenge is the “shortage of high-quality data, which are required for machine learning to be effective,” according to the report. Another obstacle is “accessing and sharing these data” which is difficult “due to costs, legal issues, and a lack of incentives for sharing.”

When it comes to the lack of high-quality data, the report’s authors point out that much of the data were not collected for machine learning purposes and that biases in the data—such as the under-representation of certain populations—may limit the technology’s effectiveness.

In addition, they contend that acquiring, curating and storing data is expensive while uncertainty around privacy laws hinders data sharing, which may also be limited by a lack of economic incentives.

To address these challenges, the report makes the following recommendations for policymakers:
  • Create mechanisms or incentives for increased sharing of high-quality data held by public or private actors, including legal consequences for improper data sharing or use.
  • Collaborate with relevant stakeholders to establish uniform standards for data and algorithms, thereby improving interoperability by more easily allowing researchers to combine different datasets—as well as ensuring algorithms remain explainable and transparent and aiding data scientists with benchmarking..

The first part of the report comes from excerpts of NAM’s 2020 Special Publication Artificial Intelligence in Health Care: The Hope, the Hype, the Promise, the Peril. The second part of the report is the full presentation of GAO’s Technology Assessment Artificial Intelligence in Health Care: Benefits and Challenges of Machine Learning in Drug Development.

While AI has the potential to transform and disrupt healthcare, “it is prudent to balance the need for thoughtful, inclusive healthcare AI that plans for and actively manages and reduces potential unintended consequences, while not yielding to marketing hype and profit motives,” concludes NAM’s part of the report.

More for you

Loading data for hdm_tax_topic #better-outcomes...