NLP finds links between genomic anomalies, cancer drug responses
Text mining system speeds the process of manually reviewing and extracting information from medical literature.
Computer scientists from the University of Delaware and Georgetown University have jointly developed a text mining tool based on natural language processing that rapidly determines which cancer therapeutics are most likely to be effective given a patient’s genetic biomarkers.
The system leverages NLP techniques that analyze words and phrases buried in the free text of medical literature to find relationships indicating the impact of genomic anomalies on cancer treatments.
“In the field of precision medicine, such relationships will be valuable for determining the right personalized treatment based on patient’s molecular profile,” contend the authors of a recent study published in the journal PLOS One.
Called eGARD (extracting Genomic Anomalies association with Response to Drugs), the system was applied to about 36,000 PubMed abstracts that were retrieved for 50 genes and 42 cancer drugs and was able to match genetic signatures with treatment outcomes, achieving 95 percent precision.
“We primarily focused on targeted therapies that have some kind of a biomarker,” says Subha Madhavan, director of the Innovation Center for Biomedical Informatics at Georgetown University Medical Center. “The tool is broad enough to apply to different cancer types. The algorithm is trained by exposing it to a lot of abstracts that contain the outcomes data.”
The text mining system speeds up the time-consuming process of manually reviewing and extracting information from medical literature, according to Madhavan. “Can a human being sit and read through a million abstracts and pull out the information that eGARD pulls out?” she asks. “Yes, absolutely, but it’s going to take them years to do it. We can do it over a weekend with eGARD.”
The system was funded by the National Institutes of Health under its Big Data to Knowledge initiative, aimed at developing new software, tools and analytics to mine large amounts of biomedical data.
Also See: NIH awards will tackle challenges of big data
“The tool is intended for biocurators and scientific staff who work with oncologists to use eGARD to pull out the various evidence for these biomarkers and bring them to tumor boards to summarize the data and provide case-specific information to clinicians,” adds Madhavan. “It’s the combination of human and machine that makes it powerful.”
A tumor board is a multidisciplinary meeting where complex patient cases are discussed, frequently involving difficult tumors or patients who have previously received treatment and require a different treatment plan.
Going forward, she notes that the next step is to develop an interactive web interface to enable curators and researchers to search, manipulate and analyze results from eGARD. “We have a prototype now that we’re editing,” she adds.
Ultimately, Madhavan says the goal is to integrate the tool with other software such as the Georgetown Database of Cancer (G-DOC), a precision medicine platform for both researchers and clinicians containing molecular and clinical data from thousands of patients and cell lines, along with tools for analysis and data visualization.
The system leverages NLP techniques that analyze words and phrases buried in the free text of medical literature to find relationships indicating the impact of genomic anomalies on cancer treatments.
“In the field of precision medicine, such relationships will be valuable for determining the right personalized treatment based on patient’s molecular profile,” contend the authors of a recent study published in the journal PLOS One.
Called eGARD (extracting Genomic Anomalies association with Response to Drugs), the system was applied to about 36,000 PubMed abstracts that were retrieved for 50 genes and 42 cancer drugs and was able to match genetic signatures with treatment outcomes, achieving 95 percent precision.
“We primarily focused on targeted therapies that have some kind of a biomarker,” says Subha Madhavan, director of the Innovation Center for Biomedical Informatics at Georgetown University Medical Center. “The tool is broad enough to apply to different cancer types. The algorithm is trained by exposing it to a lot of abstracts that contain the outcomes data.”
The text mining system speeds up the time-consuming process of manually reviewing and extracting information from medical literature, according to Madhavan. “Can a human being sit and read through a million abstracts and pull out the information that eGARD pulls out?” she asks. “Yes, absolutely, but it’s going to take them years to do it. We can do it over a weekend with eGARD.”
The system was funded by the National Institutes of Health under its Big Data to Knowledge initiative, aimed at developing new software, tools and analytics to mine large amounts of biomedical data.
Also See: NIH awards will tackle challenges of big data
“The tool is intended for biocurators and scientific staff who work with oncologists to use eGARD to pull out the various evidence for these biomarkers and bring them to tumor boards to summarize the data and provide case-specific information to clinicians,” adds Madhavan. “It’s the combination of human and machine that makes it powerful.”
A tumor board is a multidisciplinary meeting where complex patient cases are discussed, frequently involving difficult tumors or patients who have previously received treatment and require a different treatment plan.
Going forward, she notes that the next step is to develop an interactive web interface to enable curators and researchers to search, manipulate and analyze results from eGARD. “We have a prototype now that we’re editing,” she adds.
Ultimately, Madhavan says the goal is to integrate the tool with other software such as the Georgetown Database of Cancer (G-DOC), a precision medicine platform for both researchers and clinicians containing molecular and clinical data from thousands of patients and cell lines, along with tools for analysis and data visualization.
More for you
Loading data for hdm_tax_topic #better-outcomes...