Researchers at Regenstrief Institute and Indiana University School of Informatics and Computing say they now can detect cancer cases using data from free-text pathology reports at least as well—and faster—than clinicians reviewing reports manually.

The researchers used existing data algorithms and open source machine learning tools to create a breakthrough electronic approach that could significantly speed patient diagnoses and public health reporting. “We think that it’s no longer necessary for humans to spend time reviewing text reports to determine if cancer is present or not,” says Shaun Grannis, MD, senior study author and interim director of the Regenstrief Center of Biomedical Informatics.

Machine learning, which in healthcare is most frequently associated with initiatives using IBM’s Watson Health technology, uses established algorithms to find meaningful patterns in data automatically, and then uses these known patterns to uncover fundamental relationships, Grannis explains.

Shaun Grannis, MD
Shaun Grannis, MD

At Regenstrief/IU, machine learning identified patterns of language in pathology reports, enabling algorithms to create a rule that if certain factors or findings are found in the automated pathology review, then a patient is likely to have cancer.

Also See: Hospitals use cloud computing technique to fight infectious diseases

Researchers sampled 7,000 free-text pathology reports from more than 30 hospital participants in the Indiana Health Information Exchange to predict if a report was positive or negative for cancer. In general, using standardized lab data, electronic automated surveillance captures four times as many notifications as humans, researchers say.

Grannis contends the new technology is ready for use now to rapidly review large amounts of cancer data without human supervision. Further, development of new algorithms can enable the same automated surveillance of other diseases.

However, that doesn’t mean automated surveillance will be rapidly or widely adopted, he acknowledges. It will take time for clinicians to trust the technology, just as it will take time for consumers to trust Google self-driving cars and take their hands off the steering wheel.

But Indiana could be a good test bed for the technology, as the state as had automated public health surveillance reporting—currently, 40 notifiable diseases to report to public health agencies, since 2000, Grannis contends. He led development of the state’s surveillance system, which already detects communicable disease outbreaks seven to nine days quicker while finding four times as many cases as human reporting. “Flint, Michigan, could never happen here because we have automated systems that monitor lead levels across the state,” he contends.

The study, “Towards better public health reporting using existing off the shelf approaches: A comparison of alternative cancer detection approaches using plaintext medical data and non-dictionary based feature selection,” is published in the April 2016 issue of the Journal of Biomedical Informatics.

Register or login for access to this item and much more

All Health Data Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access