HIT Think

Why privacy and artificial intelligence still need some harmonization

Headlines proclaiming the next great application of artificial intelligence appear with a great degree of frequency. As much that is true generally, healthcare seems to receive potentially an even greater amount of such announcements.

Determining the validity behind the assertions about artificial intelligence can be difficult. There will be truthful claims and others that do not live up to the hype. The arguably more interesting question is what artificial intelligence, including both the development and use, is doing to privacy.

Artificial intelligence systems require large volumes of data to be developed, trained and constantly refined. Where do the data come from? If talking about healthcare, the data must be about patients, which means sensitive, regulated information must be ingested, utilized and held. Given the use of healthcare information, HIPAA has the most obvious impact since, because HIPAA governs the use and disclosure of patient information.


If patient information is necessary for artificial intelligence, what challenges to privacy does artificial intelligence present? Some of the leading issues include potentially making de-identification impossible, algorithmic bias and patient safety or treatment errors.

The arguable inability to de-identify data is one of the biggest concerns. The basis of the concern is that artificial intelligence systems will pull data points from so many different sources that even when all identifiers required by the HIPAA de-identification standard are removed, individuals can still be identified. If artificial intelligence makes de-identification impossible, then those systems could remove the ability under HIPAA to obtain and retain the data needed on some level for the development and maintenance of the systems.

A second privacy concern is algorithmic bias. The basis of this concern is that the artificial intelligence system is either including data sources that introduce or push inappropriate decisions or the individuals training the artificial intelligence are biased. Either way, the introduction of bias could intrude upon an individual’s privacy by needing to obtain more information from more sources or driving unnecessary interventions.

Embedding such risks into a system may not be easily rectified either if the basis for the issue cannot be uncovered. Considering from the loss of privacy angle, an argument to remove bias based on poorly sourced data is to draw in more data from an increasingly diverse array of sources. If that occurs, then where would the stopping point be?

An argument could always be made to pull in more and more data, but feeding all of the data into a system could produce unknown, unintended and other unforeseen effects. At the same time, not getting data from a deficient scope of sources may not enable a system to adequately address the needs of a diverse population that could use the systems. As such, the very real potential for bias can encourage design that does not give privacy sufficient due because privacy could hinder the ability to gather enough data to reduce bias.

The last (for now) privacy concern about artificial intelligence is declining trust from patients. If a patient knows that information will be fed into an artificial intelligence system and stored in some unknown number of places, could the patient withhold information? If information is withheld, what will that information be and how critical could it be to determining an appropriate course of treatment or action?

Concerns about maintaining privacy in technology are not unfounded, especially given some debate as to whether privacy (at least traditional concepts of it) still exists. If privacy does not exist in technology systems, then one argument is that the only way to re-establish privacy is to avoid letting information be entered into an electronic system altogether. While withholding data, at least anecdotally, is not a new issue, the pace of withholding could increase or expand to new patient populations, which will hinder the ability to provide care.

The examples of undermining de-identification, bias, and withholding of information all fall on the “foe” side of the equation. The concerns are certainly valid and rooted in reality. However, each of the concerns also forms the basis for artificial intelligence to be part of the solution for addressing the concern. The power of artificial intelligence, therefore, is not just to break down but also build up.

While artificial intelligence poses a risk to de-identification, from a different perspective it can also create the ability to construct new means of de-identifying data. Just as artificial intelligence can pull from multiple sources to identify individuals from a smaller set of data, the process could also be used in the inverse to implement new protections that render data even more de-identified than before.

As suggested above, HIPAA calls for the removal of certain identifiers for data to be considered de-identified. That is only one of the two means of achieving de-identification though. The second means is to have an individual with appropriate knowledge of statistical and scientific methods for rendering information de-identified to certify that data are de-identified. Could an artificial intelligence system fill that role?

At some point in the not too distant future, that question may need to be answered. As such, artificial intelligence, like most tools, can be both a boon or a bane. The side on which it falls is driven by the proposed used.

A second (and currently in practice) positive to privacy from artificial intelligence is enhanced compliance monitoring. Under HIPAA and really good privacy practice generally, information systems and data should be audited and monitored. The purpose of the auditing and monitoring is to determine whether data leakage, inappropriate access or other compromises to the data are occurring. Historically, auditing and monitoring has been a manual process.

When systems did not contain terabytes worth of data a manual process could randomly review a decent amount, although still not all, of the data. With the data flood continuing to get deeper, manual processes can only touch a small drop. The manual process is also a discouraging process and one that not many would necessarily want to get into. That is where an artificial intelligence system can step in. The systems are capable of auditing and monitoring all of the data and also refined enough at this point to not produce an overwhelming amount of false red flags. Just as encryption has become practically unavoidable at this point in time, if artificial intelligence can materially cut down the time to discovery of potential privacy compromises, at what point in time will it also be de facto required? The point in time when that question must be answered may not be all that far away.

A third area for consideration is how artificial intelligence may also bolster an overall security posture. Security may not be privacy, but privacy cannot be had without security. If data are not protected in a secure manner, then how can privacy be assured? From that perspective, artificial intelligence system can help with layering protections and providing a nimble background that could have a chance of keeping pace with the changing nature of cyberattacks. Much like the auditing and monitoring, why not take advantage of systems that can run a large array of analysis.

Artificial intelligence can be easily portrayed as a threat, but what is initially a threat can be turned into a friend.

For reprint and licensing requests for this article, click here.