Big Data Analytics and Drug Surveillance

Many drugs are approved after limited clinical trials, and it’s only after they’re on the market—sometimes quite awhile after–that problems surface.


Many drugs are approved after limited clinical trials, and it’s only after they’re on the market—sometimes quite awhile after--that problems surface.

Early warning on those problems is possible with relatively simple data-mining techniques similar to what Google uses, according to Nigam Shah, assistant professor of biomedical informatics at Stanford University, who will be presenting a study at HIMSS demonstrating the impact that big data analytics can have on drug surveillance.

A retrospective analysis of data from one university hospital, on 1.8 million patients over 15 years, was able to flag six out of nine drugs recalled over the past ten years, based on studying the adverse events for which the drugs were recalled. If the analysis had been done concurrently, it might have caught the problems earlier, Shah says.

The big data used in the study is an often overlooked resource: unstructured clinical notes.  The mining technique doesn’t rely on natural language processing or NLP, which allows computers to “understand” free text. The language of clinical notes is anything but natural, consisting of abbreviations, jargon, and turns of phrase that are unique to the medical professions and sometimes vary substantially among providers. “It’s like they’re written in haiku,” Shah says.

Instead, the team created algorithms that rapidly sort through millions of notes looking for strings of words associated with the events being studied:  the diagnosis of a condition, the prescribing of one of the target drugs, and the adverse event.  They created a matrix showing how many of the events were present for each patient and the time relationship among them, making it simple to do a quick count of how many times the events occurred in the correct order to be a possible drug reaction.  

The team studied both known problem drugs and drugs that would never cause an adverse event, and measured the number of times the technique produced true and false positives. The overall combined accuracy of the technique was 77 percent, and Shah says it’s likely to get better with refinement.

It could also be useful for identifying drugs that are safer than they seem.
For example, the team looked at a drug for peripheral artery disease that has a black-box warning against prescribing it for patients with any type of heart failure, due to a high risk of cardiovascular events. Physicians can choose to prescribe it for those patients anyway, and some of them are willing to take the risk in order to improve their quality of life. “You can’t go to an institutional review board and ask to give this drug to someone with heart failure, but our collaborators thought it wasn’t that risky,” Shah says. When they scanned the data for 47 such patients, the analysis found no significant difference in cardiovascular events between those patients and others without heart failure who took the same drug.  The same analysis at another hospital showed the same result for 97 patients.

“The point is to keep going through the data periodically and uncover these unexpected patterns,” Shah says.

Education session #73, “Mining of Clinical Notes to Aid in Drug Surveillance,” is scheduled for March 5 at 9:45 a.m.