HIT Think

How healthcare can play safe in the data lake

We're seeing more healthcare organizations get serious about big data. Even in the boardroom, healthcare leaders are asking, “How can we use big data to help us improve outcomes?”

What's driving this trend? Other industries have benefitted enormously from big data; how do we in healthcare do the same? And most importantly, what do we need from big data to make it not just a promising technology, but a means towards affordable, patient-centric care?

Currently, healthcare organizations are struggling to get their heterogeneous data all in one place. They're battling to breakdown data silos and make all their information locked in patient records and other disparate systems available for analysis. And that’s just within a single hospital or system. While Healthcare Information Exchange projects have made a dent in the broader problem, they really haven’t solved the silo issue.

The problem is only becoming more urgent. Healthcare has become increasingly digitized; deploying EMRs and EHRs has created a flood of patient health records; new digital devices in hospitals monitor patients and provide analysis; and patients are using personal health devices to monitor conditions at home.

Every day, healthcare providers accumulate more data. That’s even more information locked away, inaccessible. At the same time, as payment plans change and the world focuses in on value-based care, there’s increasing pressure to make this crucial information actionable. With the need to look at episodes of care spanning multiple providers, the challenge increases by orders of magnitude. All this pressure, and we’re still stumbling on apparently easy starting points, such as quality and outcome reporting.

Enter big data. It turns out the HIE problem we’ve all been so keen to solve is actually a data-gathering problem. The fix is to get all your data in one system, one with the agility, capacity, and flexibility to work with all data types effectively – whether its electronic medical records, lab results, IoT device data, or even genomics information. But as soon as we even talk about gathering data into one place, we raise concerns about privacy, governance and security.

To meet the dual needs of protecting and sharing data, big data platforms must evolve. Now that healthcare organizations are seriously considering big data, privacy, governance and security have become the next roadblocks. In a survey of more than 3,000 global data and analytics decision-makers, Forrester found that “maturity of technology around security” tied for the biggest cited challenge impeding organizations from executing their "vision for big data."

Also see: 5 Ways Data Lakes Improve Healthcare Processes, Outcomes

With CIOs under a microscope to provide solutions that enable data use and effectively protect both the data’s and the individual’s privacy, it’s clear that landing data in a data lake is only the first step. Healthcare providers also need sophisticated governance, privacy and security structures.

Big data’s agility and flexibility can unlock new insights for healthcare organizations struggling to bring together their disparate data sources. However, none of that will matter if this highly sensitive data can’t be shared while keeping privacy intact. It’s an exciting time to take a dip into a data lake, but organizations shouldn’t forget what features they’ll need in their big data solution in order to swim.

An extensible metadata framework. Metadata, used intelligently, can structure your whole data lake, so that it is available to people who can use it—and reuse it—to create value, as well as to those whose job it is to audit it, cleanse it, and supervise it responsibly.

Indexing and cataloging capabilities, so that data is findable. Automatic indexing (a result of an effective metadata strategy) is the one of the most important ways you can reduce lookup time and the amount of time people have to spend hunting for data. Your index data structures must maintain integrity with the data being stored. You should be able to elaborate on and enrich your metadata as your requirements evolve, and you should be able to structure it and link it - to show relationships between data items.

A mechanism to match user attributes against metadata and policies right in the datastore. To use an attribute-based approach, you do need metadata, but you also need a way to describe users—the people and applications that want to access the data. Attributes are easily changed, and they are flexible—if you effectively manage them, and if your identity system is sufficiently flexible, user attributes can be extended to account for just about any combination of organization, department, role, authorization, device, relationship, etc.

In-datastore policy management, so that access rules are automatically enforced. Every data request for access should be mediated by the HCO’s governance policies: the big data system’s policy management framework should be able to provide a decision for each data access request, based just on the characteristics of the data and the attributes of the user. With policy-based access control set by metadata and user attributes, you get data security along with the kind of flexibility you need to manage data sharing and collaboration.

The data lake on its own has challenges related to findability, data security, data sharing, and governance, but there are excellent strategies emerging to address these challenges and allow you to navigate safely through the data lake.

For reprint and licensing requests for this article, click here.