Big Data Remains Untapped Resource for Healthcare

The enormous amount of DNA sequencing, biomedical imaging, electronic health records and other data generated by researchers and clinicians holds great promise for healthcare. However, gaining access to this mountain of data and making sense of it remains an elusive goal.


The enormous amount of DNA sequencing, biomedical imaging, electronic health records and other data generated by researchers and clinicians holds great promise for healthcare. However, gaining access to this mountain of data and making sense of it remains an elusive goal.

Earlier this month, National Institutes of Health Director Francis Collins, M.D., wrote a blog touting the potential benefits of big data, calling this enormous, ever-expanding treasure trove of digital data a "priceless raw material for the next era of biomedical research." Collins believes that "we are at a point in history where big data should not intimidate, but inspire us." 

However, according to researchers affiliated with Boston's Harvard Medical School, Beth Israel Deaconess Medical Center and Boston Children's Hospital, technical and social challenges continue to serve as barriers to leveraging the full power of big data in healthcare. They argue that while other industries have been successful at obtaining value from large-scale integration and analysis of heterogeneous data sources, the medical community is being left behind. 

"What these industries have figured out is that big data becomes transformative when disparate data sets can be linked at the individual person level. In contrast, big biomedical data are scattered across institutions and intentionally isolated to protect patient privacy," state the authors in a May 22 article in the Journal of the American Medical Association.  

The prerequisite for unlocking the real value of big data, they assert, is the ability to link data at the patient level. "Although some big data, such as electronic health records, provide depth by including multiple types of data (e.g., images, notes, etc.) about individual patient encounters, others, such as claims data provide longitudinality—a view of a patient’s medical history over an extended period for a narrow range of categories," the authors argue. "Linking data adds value when they help fill in the gaps."

Of particular value, according to the article, are "nontraditional" sources of biomedical data outside of the healthcare system such as social media, credit card purchases and census records which can "help assemble a holistic view of a patient, and, in particular, shed light on social and environmental factors that may be influencing health."

According to the authors, by linking this data, physicians and researchers will be able to test new hypotheses and identify areas of possible medical intervention. For example, they propose that grocery shopping patterns obtained from stores in various areas could be used to predict rates of obesity and type 2 diabetes in public health databases, and to determine whether the level of exercise recorded by home monitoring devices correlates with response rates of cholesterol-lowering drugs, as measured by continued refills at the pharmacy.

Nevertheless, as the article points out, the lack of a national unique patient identifier in the United States is a technical obstacle in linking big biomedical data. In addition, the authors say that privacy and security concerns also remain a formidable social challenge to data linkage. "As more data are linked, they become increasingly more difficult to deidentify," they say. "The consequences of this in healthcare, particularly for mental health records and genetic markers, have been extensively studied and discussed. However, given that data linkage is already happening in other industries and is increasingly being thought of as an informational asset for healthcare delivery, monitoring, and marketing, it would behoove the medical establishment to guide societal and legislative standards in this regard."

Toward that end, they propose that a public forum of stakeholders be convened, including citizens, the healthcare community and commercial data vendors, to frame a policy for technical protections for big biomedical data linkage.