In a groundbreaking study published in JAMA Internal Medicine, researchers from Beth Israel Deaconess Medical Center (BIDMC) have revealed that ChatGPT-4, an artificial intelligence (AI) program, demonstrates clinical reasoning abilities equivalent to or better than internal medicine residents and attending physicians.
Led by physician-scientists at BIDMC, the study compared the clinical reasoning capabilities of a large language model (LLM) against human performance using standards developed to evaluate physicians’ diagnostic and reasoning skills.
Dr. Adam Rodman, an internal medicine physician and investigator at BIDMC, highlighted the significance of assessing LLMs beyond mere diagnostic accuracy. “It’s a surprising finding that these things are capable of showing the equivalent or better reasoning than people throughout the evolution of clinical case,” said Dr. Rodman.
Using the revised-IDEA (r-IDEA) score, a validated tool for assessing physicians’ clinical reasoning, the researchers evaluated the performance of 21 attending physicians and 18 residents alongside ChatGPT-4. Participants worked through 20 clinical cases, each comprising four stages of diagnostic reasoning.
Lead author Dr. Stephanie Cabral, a third-year internal medicine resident at BIDMC, outlined the sequential stages of diagnostic reasoning involved in the study. “The first stage is the triage data… The second stage is the system review… The third stage is the physical exam, and the fourth is diagnostic testing and imaging,” explained Dr. Cabral.
The study revealed that ChatGPT-4 achieved the highest r-IDEA scores, with a median score of 10 out of 10, compared to 9 for attending physicians and 8 for residents. While the AI matched or surpassed humans in diagnostic accuracy and clinical reasoning, it also exhibited more instances of incorrect reasoning, underscoring the importance of AI as a complementary tool to human reasoning rather than a replacement.
Dr. Cabral emphasized the potential of AI to enhance patient-physician interactions and streamline healthcare processes. “My ultimate hope is that AI will improve the patient-physician interaction by reducing some of the inefficiencies we currently have and allow us to focus more on the conversation we’re having with our patients,” said Dr. Cabral.
Dr. Rodman echoed this sentiment, highlighting the transformative impact of AI on healthcare quality and patient experience. “We have a unique chance to improve the quality and experience of health care for patients,” he stated.
The study underscores the evolving role of AI in healthcare and the need for further research to determine its optimal integration into clinical practice.
Co-authors of the study included physicians from BIDMC, Massachusetts General Hospital, and Brigham and Women’s Hospital.
The findings pave the way for future studies aimed at harnessing AI’s potential to augment clinical reasoning and enhance patient care.
Journal Reference: Stephanie Cabral et al. Clinical Reasoning of a Generative Artificial Intelligence Model Compared With Physicians, JAMA Internal Medicine (2024). DOI: 10.1001/jamainternmed.2024.0295