ChatGPT: A lot of hype, but tantalizing potential
HIMSS23 was abuzz with discussion of generative AI, but much work lies ahead in fine-tuning it for healthcare and finding appropriate uses for it while it continues to mature.
It’s not like a HIMSS conference hadn’t been hit with a storm of hype before. But like the snow that flurried on Chicago the day before HIMSS23 began, talk surrounding ChatGPT was seemingly everywhere, a blizzard of promises and expectations from which it was difficult to shovel out.
From keynote presentations to humorous asides (such as, “and ChatGPT did NOT write my speech!” by more than one presenter), discussion of the rapidly evolving technology was everywhere. For an industry searching for any credible way to relieve documentation burdens from clinicians and automate other forms of support, this shouldn’t be a surprise.
And perhaps the most significant imprimatur for ChatGPT came from the joint announcement from Microsoft Corp. and Epic that they plan to expand their strategic collaboration to develop and integrate generative AI into healthcare by integrating solutions, such as ChatGPT, into Epic’s electronic health record software.
Still, important words in that announcement are “develop” and “integrate,” which should underscore the early nature of this still emerging technology. While the latest version of ChatGPT is less than a few months old, the potential it offers for healthcare underscores both its potential and the need to understand what it can – and cannot – do.
Generative artificial intelligence involves the use of algorithms that can be used to create new content – this can include text, computer coding, simulations, audio and more. “Recent breakthroughs in the field have the potential to drastically change the way we approach content creation,” according to a report from McKinsey.
ChatGPT falls into this class of solutions. The GPT portion stands for generative pretrained transformer, and the solution is being developed by OpenAI, an artificial intelligence research organization, and released to the public in November 2022.
ChatGPT interacts with users in a conversational way. “The dialogue format makes it possible for ChatGPT to answer follow up questions, admit its mistakes, challenge incorrect premises and reject inappropriate requests,” the developing organization contends. Essentially, it’s trained to follow an instruction in a prompt and provide a detailed response.
Early versions of these tools can produce writing or other forms of content based on input or interaction with a content creator. ChatGPT can serve as a chatbot, producing an answer to almost any question it’s asked. The latest version, ChatGPT-4, is the most recent version of OpenAI’s large language model that has been trained on large quantities of online data to create complex responses, producing answers with as many as 25,000 words.
But it’s still evolving technology, and even OpenAI admits that it can make “simple reasoning errors” or be “overly gullible in accepting obvious false statements from a user.” And the potential for risk in high-acuity settings was underscored this spring when an AI-trained chatbot was blamed for a Belgian man’s death.
Healthcare hopes and hype
Early tests have shown potential uses for ChatGPT in healthcare. In a recent example AI chatbots, as artificial intelligence assistants, were shown to be as capable as physicians in drafting responses to patient questions posted to a social media forum. The article on the research noted that the chatbot’s answers were reviewed by physicians before they were posted.
“The rapid expansion of virtual healthcare has caused a surge in patient messages concomitant with more work and burnout among healthcare professionals,” the article noted. “Artificial intelligence (AI) assistants could potentially aid in creating answers to patient questions by drafting responses that could be reviewed by clinicians.”
Concern has grown about clinician burnout, especially as physicians are fielding more requests from patients who are using digital means, such as patient portals, to send questions or make requests. Automated chatbots, then, could relieve some of that burden from clinicians, who might only have to review generated responses.
“A good use of technology simplifies things related to workforce and workflow,” says Chero Goswami, chief information officer at UW Health. “Integrating generative AI into some of our daily workflows will increase productivity for many of our providers, allowing them to focus on the clinical duties that truly require their attention.”
The Microsoft-Epic collaboration also will examine bringing natural language queries and interactive data analysis to SlicerDicer, Epic’s self-service reporting tool, with the hope that it can help clinical leaders explore data in a conversational and intuitive way. “Our exploration of OpenAI’s GPT-4 has shown the potential to increase the power and accessibility of self-service reporting,” says Seth Hain, senior vice president of research and development at Epic.
Another assist for clinicians could come from a combination of technologies that could facilitate the documentation process. For example, the use of ambient listening can help clinicians draft encounter notes by listening to conversations between physicians and patients and writing a first draft of the physician’s note for the electronic record. This technology already is in use at University of Michigan Health-West and elsewhere.
Pairing this capability with ChatGPT could further ease documentation time and other content creation, contends John Tippetts, chief architect at Intermountain Healthcare, who is responsible for leading teams in enterprise and business architecture, enterprise solution architecture and emerging technologies.
“There is continued innovation in the delivery of patient care, continued evolution of ambient AI and generative AI,” he says “These will be tools to help caregivers, not replace them. They will continue to reduce the mundane tasks that caregivers have to do day in and day out. How much time do doctors and nurses spend in documentation?”
In addition to notetaking, generative AI such as ChatGPT can help with routine communication, such as requests to insurers for prior authorization, predicts Peter Lee, corporate vice president of research and incubation at Microsoft.
But many are warning about putting too many expectations on the shoulders of ChatGPT and other generative AI.
The ease with which these technologies generate content tends to mask the complexity involved and the dependence they have on the input upon which they base their output, Lee notes.
“No doubt there is still a question here; GPT-4 is unable to explain its bias and delineate its data sources. There are still deep scientific mysteries that we are trying to come to grips with,” Lee notes. “They seem to be able to reason. It is technically correct to say that these are doing next-word prediction, but misleading to say that, too. This is a multi-trillion dimensional space that is understanding the latent structures in a large amount of data. To do true next-word prediction, you may need to do other analysis. Even simple next-word prediction may require the full complement of intelligence that humans have.”
In healthcare, that will mean preceding with caution, contends John Halamka, MD, president of the Mayo Clinic Platform. “We’re seeing short, medium and long-term use cases. In 2023, it’s not a time to measure accuracy of the technology, but maybe to use it for low risk use cases. We need a use case limitation until we have better controls,” he says.
Input is critical to accuracy in what comes out of generative AI, he says. “So we probably need to get more controls on sources of information” on which output is based. For example, in some multi-step interactions, generative AI “tends to hallucinate more. So it needs to be ‘trained’ on a limited data set and curated to be accurate, and even so asks questions that are pretty straightforward.”
Even OpenAI admits the following limitations for its technology:
- • ChatGPT sometimes writes plausible sounding but incorrect or nonsensical answers, depending on the "source of truth" used to frame answers.
- • ChatGPT is sensitive to slight variations made to input phrasing.
- • The model is sometimes excessively verbose and overuses certain phrases.
- • The model usually guesses at what users intend when they make ambiguous queries.
- • The model has not yet demonstrated that it can identify and refuse the respond to inappropriate requests.
And even with current limitations of the technology, it may be able to free clinicians from “creating content” such as clinical notes, but it’s not capable of doing the deductive reasoning that clinicians do.
Still, with time and involvement from healthcare organizations, generative AI can bring benefits to healthcare, contends Reid Blackman, author of “Ethical Machines” and founder and CEO of Virtue, a digital ethical risk consultancy.
“I think the things we’re worried about, the engineers are going to patch up quickly,” he says. “ChatGPT does a good but not perfect job of summarizing information that’s out there. (Health IT professionals) will build models that are not directly giving answers but will do framing of better searches for information and then summarizing it. That’s something you can do without being at Microsoft or Google.”