It’s the kind of story that feel-good dramas are made of: A determined mother doggedly searches for the reason behind her young son’s mysterious chronic pain. The medical issue even eludes the 17 doctors that she brings her son to, so that’s when she takes matters into her own hands by turning to a popular—and controversial—piece of emerging technology: OpenAI’s ChatGPT.
After inputting his symptoms and medical data into the chatbot, she finds out that her son might be suffering from something called “tethered cord syndrome.” It’s a disorder that occurs when the spinal cord becomes wrongly attached to the spinal column—causing great pain and a host of other neurological issues. Her suspicions are confirmed when she brings this information to her son’s doctor, who diagnoses the disorder allowing the family to schedule surgery to fix the issue.
In effect: ChatGPT saved her son from his debilitating chronic pain.
This story, which was reported by Today on Sept. 11, sparked a healthy amount of discourse surrounding the ability of ChatGPT and other chatbots to potentially save lives and provide much-needed medical insights. One user on X (formerly known as Twitter) said that it’s proof that “AI saves lives today,” while another went as far as saying that the bot “is probably already a better diagnostician than many doctors working today.”
However, the truth is a lot more complicated than that. While the mother indeed was able to get a solid diagnosis for her son and get him the care he needs, the fact of the matter is ChatGPT—and every other chatbot for that matter—simply are ill-equipped to perform up to medical diagnostic standards.
Yet, since last year, we’ve seen the entire world freak out about the technology. People have claimed that these chatbots are sentient and even falling in love with them. Meanwhile, much has been made about how it’ll take over the jobs of writers, musicians, comedians, and artists. So, really, it was only a matter of time before it came for your doctor’s job too… right?
“I think this story does show the promise of language models and AI more generally for clinical care, but isn’t evidence in itself that ChatGPT is ready for prime time in the clinic,” Danielle Bitterman, an assistant professor of radiation oncology at Harvard Medical School, told The Daily Beast.
Bitterman, who also researches AI in medicine and healthcare, points out that ChatGPT is a large language model (LLM). That means that it’s trained as a chatbot whose sole task is to predict the next word in a string of text. While that means it’s capable of communicating “some degree of useful information,” it doesn’t mean that it can be relied on to dispense medical advice.
It’d be like asking a plumber about what might be wrong with your car. Sure, you might get a good answer—but you should just ask your mechanic. Likewise, an LLM is trained on language, not medicine and healthcare. It could stumble upon an answer for what you’re looking for, but you’re still better off seeking medical advice from actual clinicians.
“It is wonderful that ChatGPT was helpful for this family, but we still need more studies demonstrating benefit and more work developing specialized models for clinical settings,” Bitterman said.
Scientific literature is fairly mixed on ChatGPT’s efficacy in diagnostic situations. Despite the fact that ChatGPT is capable of passing the U.S. Medical Licensing Exam, one study published in JAMA Oncology in Aug. 2023 found that OpenAI’s chatbot “did not perform well at providing accurate cancer treatment recommendations,” and often provided incorrect recommendations mixed in with correct ones. However, there has also been some research that has shown that the bot can provide useful breast cancer screen advice.
Even though it has shown some promise in dispensing medical information, the risks might far outweigh the costs for many users. For one, these LLMs are prone to the exact same issues that have plagued AI for decades: a tendency to hallucinate facts, spread misinformation, and exhibit racist, sexist, and all around problematic behavior. We’ve seen this occur with the likes of ChatGPT and Google’s Bard already. Even LLMs ostensibly tailor made for science and medicine have been shown to get it wrong a lot of the time.
That’s not to say there won’t ever be a place for LLMs in clinical and diagnostic situations. In July 2022, The Wall Street Journal reported that the Mayo Clinic in Minnesota is using Google’s Med-Palm 2 to help clinicians answer health and medical questions. While Bitterman believes that there is a place for AI in medicine, she said that these technologies “should not be used without careful evaluation.”
“[Overall], I am optimistic about the potential of AI to transform medicine,” Bitterman said. “We just need to do it carefully for the high-stakes clinical domain. There is too much at stake if we get this wrong; patient safety is paramount. If there are early errors due to hasty uptake without sufficient testing, it could ultimately set the field back and slow the potential gains.”
Of course, therein lies the danger when it comes to AI—especially if users are seeking medical advice. When you’re in a situation where you’re desperately looking for answers to save you or someone you love, you might find yourself willing to listen to any source of advice that might help—even if it comes from a bot.