We’re swimming in a sea of data that includes claims and clinical EMR data, registries going back decades, and the looming deluge of new genomic records, with all adding to the flood. Buried in all the noise is the potential for amazing insights to lower costs, drive quality, and improve health outcomes.

Innovators in every area of healthcare are mining data for clues to reduce readmissions, identify care gaps, flag patients at risk in a population, and much more. But the first step is getting useful data into the hands of analysts and researchers. And right now, health data is just not easily “consumable.”

In virtually every other field, it’s relatively easy to collect, interact with, and create services with data. But in healthcare, the data pipeline is long, complex, and fraught with do-overs. Whether it’s a dataset for population health analysis, metrics for improving hospital operations, or a cohort for disease research, it takes weeks or months just to get data, before analysts can even begin using it.

Many healthcare leaders have resigned themselves to this status quo. But it doesn’t have to be so painful. There are steps you can take to balance data protection for privacy and security with data sharing, and get your data into the hands of people who can derive value from it faster. As Ann Cavoukian, former Information and Privacy Commissioner for Ontario states, “You can have privacy and data analytics; you can have privacy and research. It is not a ‘versus’ situation. It is possible to have both.”

If healthcare organizations could just get more and better information, they could use data science and analytics to do amazing things. But the journey from building a data warehouse, to getting clean data into it, to making sure it’s protected, to finally getting it into the right people’s hands is a long one.

Modern data visualization and analytics tools make it easy to analyze the data, but for true self-serve, you have to make the data accessible. You need to cleanse, index, and catalog the data so researchers can find it, and then protect private information. This “missing link” can lead to significant delays.

Some of this is intentional. Researchers go through a strict review process to ensure that personal health information (PHI) is protected and that data is seen by only the right people, for the right reasons. But much of this process is manual and inefficient, when it doesn’t have to be.

For example, one big challenge for investigators is just knowing what to ask for. Suppose an analyst or researcher wants to conduct a particular investigation. She has to put in a dataset request to the analytics team, specifying inclusion/exclusion criteria and fields of interest. Once that dataset is produced, the investigation often leads to missing fields of interest. What initially seemed a straightforward request quickly bogs down to a cumbersome process that can take weeks.

Technology can’t eliminate all manual effort. There will always be a need for reviews and approvals to release information, and linking data will continue to be complex due to re-identification risk. But modern data technologies can streamline this process and make it a lot easier to collect, secure, and share health data.

Here are four steps you can take to start getting more from your data:

Step 1: Get all the data in one place. Whether its claims data, EHR clinical information, imaging studies, diagnostic reports, or genome sequences, use a flexible repository that is able to store structured, semi-structured and unstructured data.

Step 2: Index and catalog. By properly indexing and cataloging data with clear data definitions and provenance, it becomes easily searchable at speed. You can then publish a portal describing all the datasets available to users, and give them an intuitive way to define inclusion/exclusion criteria and select fields of interest, without seeing PHI.

Step 3: Automate the selection and de-identification of data sets against federal Safe Harbor guidelines. Today, many organizations create duplicate datasets and then manually de-identify the data – a process that adds considerable time and cost. A more consumable approach automatically de-identifies PHI by hiding, masking, truncating or reducing the resolution of data based on well-accepted Safe Harbor guidelines. Rather than promoting database sprawl, more modern systems create virtual views of identified and de-identified datasets on-the-fly.

Step 4: Control access at the data level. Today, privacy officers are often defining, educating and policing the enforcement of privacy policies. Modern database technologies bring privacy policies squarely into the IT discussion, configuring policies that are automatically enforced by the data repository. For example, an approved physician may be able to see a fully identified dataset, including PHI, whereas a financial analyst requesting access to the same data would see only de-identified information. Moreover, the risk of re-identification can automatically be calculated for datasets, and, depending on user authorizations, shared or blocked from users. This provides stronger overall protection, while affording much more flexibility to share data with more stakeholders, without compromising privacy.

Health data will always require a different level of treatment than other types of information. But by taking these steps, you can make your data a lot more consumable, while protecting it more effectively. All of a sudden, those oceans of information become much more accessible to the people who can capitalize on it.

Register or login for access to this item and much more

All Health Data Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access

Adam Lorant

Adam Lorant

As vice president, Adam Lorant is responsible for driving vision and strategy at PHEMI Systems – a Vancouver, BC-based big data company focused on the storage, management and governance of structured and unstructured data. He works closely with leading healthcare providers, insurers and other large data-driven enterprises to help them define and implement their big data strategies.