Study Shows ChatGPT's Limits, Overprescription Risk in ER

Introduction

A new study from the University of California, San Francisco (UCSF) has found that if ChatGPT, an artificial intelligence model, were used in emergency departments (ED), it might suggest unnecessary medical interventions, such as extra X-rays, antibiotics, and even hospital admissions for patients who don’t need them.

AI Not a Substitute for Human Clinical Judgment

The study highlights that while ChatGPT can assist with tasks like drafting clinical notes or answering exam questions, it struggles with the complex decision-making required in emergency care. “It’s a reminder for healthcare professionals not to rely solely on these models,” said Chris Williams, MB BChir, a postdoctoral scholar and lead author of the study, published on October 8 in Nature Communications. Williams emphasized that ChatGPT is not yet capable of handling the nuanced decisions that doctors face in an emergency setting.

The Experiment: ChatGPT's Emergency Decisions

In earlier research, ChatGPT performed slightly better than humans in deciding which of two emergency patients was more critically ill—a simpler choice. But in this new study, the researchers tasked ChatGPT with making more complex medical recommendations, such as whether to admit a patient, order diagnostic scans, or prescribe antibiotics. These decisions involve multiple factors and often have significant consequences.

Comparing AI with Doctors

To test the AI, the researchers analyzed 1,000 real ED visits, drawn from a database of over 251,000 cases at UCSF Health. The team used doctors’ notes about each patient’s symptoms and examination results, then input them into two versions of ChatGPT—3.5 and 4.0. They compared the AI’s recommendations with actual clinical decisions made by human doctors.

The results showed that ChatGPT-4 was 8% less accurate than resident doctors, while ChatGPT-3.5 was 24% less accurate. Both versions of the AI tended to suggest more interventions than necessary, reflecting an overprescription tendency.

Why Does ChatGPT Overprescribe?

Williams explained that the model’s tendency to overprescribe might be due to the way it’s trained on data from the internet, where many legitimate medical websites often encourage users to seek medical attention as a precaution. This cautious approach works well for general public safety but can cause issues in an emergency department where unnecessary tests or treatments may harm patients, use up resources, and drive up healthcare costs.

The Future of AI in Emergency Care

For AI models like ChatGPT to be useful in emergency care, they will need to be trained with better frameworks that can evaluate clinical information more effectively. The challenge is to ensure that the AI doesn’t overlook serious conditions, while also avoiding triggering unnecessary and costly medical interventions.

Williams concluded by saying that researchers, clinicians, and the public will need to carefully consider how to balance caution and efficiency as AI continues to develop in medical settings. “There is no perfect solution,” he said. Understanding the limitations of models like ChatGPT is essential as we assess their role in clinical practice.

Source: Inputs from various media Sources