Data Re-Identification Remains Risk Despite HIPAA Safeguards

De-identified information is excluded from the HIPAA Privacy Rules definition of protected health information, permitting its use without patient authorization. However, information de-identified according to HIPAA provisions can be re-identified putting patient privacy at risk.

Oct 26 154 min read

Greg Slabodkin

Managing Editor, Health Data Management

De-identified information is excluded from the HIPAA Privacy Rule’s definition of protected health information, permitting its use without patient authorization. However, information de-identified according to HIPAA provisions can be re-identified putting patient privacy at risk.

So argues Latanya Sweeney, professor of government and technology in residence and director of the Data Privacy Lab at Harvard University. Sweeney was famously able to show how the medical record of William Weld—then-governor of Massachusetts—could be re-identified using only his date of birth, gender and postal code. In addition, she also showed in a separate study that 87 percent of all Americans are uniquely identified based on only date of birth, sex, and ZIP code.

“My Weld example was first to show how demographics could link to publicly available information to re-identify health data,” asserts Sweeney, as health and other person-specific data are often publicly available. “In general, any unique combination of data that appears in both a de-identified and an identified dataset can be the basis for a re-identification. The combination does not have to be demographics, though that is what I first used and what the HIPAA Privacy Rule focuses on.”

Also See: States Selling, Sharing Risky Personal Health Data

The Privacy Rule designates two ways through which a covered entity can determine that health information is de-identified. The first is the “Safe Harbor” approach, which permits a covered entity to consider data to be de-identified if it removes 18 types of identifiers (e.g., names, dates, and geocodes on populations with less than 20,000 inhabitants) and has no actual knowledge that the remaining information could be used to identify an individual, either alone or in combination with other information.

The second is the statistical approach, which permits covered entities to disclose health information in any form provided that a qualified statistical or scientific expert concludes—through the use of accepted analytic techniques—that the risk the information could be used alone, or in combination with other reasonably available information, to identify the subject is very small.

Yet, as Sweeney points out, technology today makes possible that purportedly de-identified data can be re-identified. Further, she charges that the HIPAA regulation itself does not actually define what is meant by minimal risk.

However, Deven McGraw, deputy director for health information privacy in the HHS Office for Civil Rights (OCR), makes the case that the goal of HIPAA’s de-identification standards was not to create a “zero risk” environment. “Charges that de-identification doesn’t work because it doesn’t bring risk to zero reflects a misunderstanding of the purpose of de-identification,” McGraw says.

“The standard for de-identification is no reasonable likelihood that the recipient of the data could re-identify it, and we created the Safe Harbor which includes a provision that in addition to the removal of the 18 identifiers that you not have actual knowledge that the data could be re-identified by the recipient,” she explains. “Frankly, though, we don’t have a lot of instances of re-identified data that’s actually been de-identified in accordance with HIPAA standards, as opposed to lots and lots of examples of re-identifications of data that were ‘anonymized’ with no particular standards guiding that anonymization. And, yet those examples get used as evidence that the de-identification methodologies don’t work.”

But, Sweeney—who is a former chief technologist for the Federal Trade Commission—says that data that may look anonymous is not necessarily anonymous. In particular, she claims there have been no rigorous tests of the re-identification risks of Safe Harbor data.

“The only test I have seen of the Safe Harbor provision was to test the very demographic linkage it was designed to thwart, but it did not test for any other vulnerabilities,” says Sweeney. “The Safe Harbor provision was crafted to thwart that example, but other vulnerabilities may remain.”

Nonetheless, McGraw promises that “as the steward” of the de-identification methodologies OCR “will continue to keep an eye on our Safe Harbor methodology as well as the guidance we’ve put out there for the expert determinations to make sure that it’s still robust.”

Sweeney counters: “If the Safe Harbor is so strong, why doesn’t HHS share data using the standard? I don’t even know of a single case where HHS releases data widely using only the Safe Harbor provision.”

More for you

Loading data for hdm_tax_topic #care-team-experience...