Georgia Tech makes machine learning algorithm available to fight cancer
Researchers at the Georgia Institute of Technology have developed a machine learning algorithm that accurately predicts the efficacy of cancer drugs and are making it freely available to the open source community in the hope of generating further medical breakthroughs.
Leveraging raw genetic data, the software tool was demonstrated to be about 85 percent accurate in assessing the treatment effectiveness of nine drug therapies in 273 cancer patients. Results of the study, conducted by Georgia Tech researchers, were published October 26 in the journal PLOS One.
“By making our algorithm ‘open source,’ we hope to facilitate its testing in a variety of cancer types and contexts leading to community-driven improvements and refinements in subsequent applications,” state the authors of the article. “It is our hope that through community sharing of this and other open source cancer drug prediction algorithms and associated data formatting/normalization procedures that the attainment of a major goal of personalized cancer medicine will be facilitated.”
John McDonald, an author of the article and director of Georgia Tech’s Integrated Cancer Research Center, says models for nine drugs were individually built using gene expression and drug response data from the National Cancer Institute panel of 60 human cancer cell lines.
“We selected nine drugs that are commonly used in the treatment of ovarian cancer,” he adds. “The algorithm goes through an optimization step where it looks for what features in the gene expression profiles correlate most highly with response to the drug.”
According to McDonald, the models were ultimately tested against the gene expression datasets of 273 ovarian cancer patient tumors. However, he points out that the models were built using data from nine different types of cancer in an effort to reduce human bias about which data are important for predicting outcomes.
“Intuitively, it would seem if you’re going to apply a model to ovarian cancer, you should build it using ovarian cancer samples exclusively—we did not do that,” McDonald notes. “We found out later that we were getting better accuracy in predicting ovarian cancer response when the models were built across all of these cancer types.”
When it comes to making predictions through machine learning, he contends that preselecting data for what researchers suspect are most relevant is a mistake. “It’s much more effective to put in loads of raw data and let the algorithm sort it out” rather than limiting the gene expression data to a specific type of cancer. “The computer narrows it down and looks for the most informative features,” says McDonald. “The algorithm can improve because you’re giving it more data to learn from.”
On a molecular level, he believes some ovarian cancers are more similar to some breast cancers than to other ovarian cancers. “It would explain why we’re getting better accuracy in building the models across a diversity of cancer types,” concludes McDonald. “For that reason, I think the predictions could apply to more than just ovarian cancer—that’s my guess, but we haven’t tested it yet.”
Georgia Tech’s open-source software can be accessed here on GitHub and is available for download.