A groundbreaking international study has demonstrated that combining human medical expertise with artificial intelligence (AI) leads to the most accurate diagnoses, especially for complex medical cases with multiple possible solutions. Published in the Proceedings of the National Academy of Sciences, the research highlights how hybrid collectives—teams of doctors working alongside AI models—outperform groups made up solely of humans or AI.
The study was led by the Max Planck Institute for Human Development, in collaboration with the Human Diagnosis Project (San Francisco) and the Institute of Cognitive Sciences and Technologies of the Italian National Research Council (CNR-ISTC Rome). Researchers analyzed more than 40,000 diagnoses generated from over 2,100 clinical vignettes—short case studies with known correct answers—comparing the results of individual doctors, groups of doctors, AI models, and mixed human–AI teams.
Complementary Strengths, Fewer Errors
AI models, including advanced large language models like ChatGPT-4, Gemini, and Claude 3, can efficiently support medical diagnoses but are not without risks. They sometimes generate false information (“hallucinate”) and can reproduce existing social or medical biases. However, the study found that humans and AI make different types of errors. When one fails, the other can often compensate, resulting in a powerful error complementarity.
The research showed that AI collectives, on average, outperformed 85% of human diagnosticians. Yet, there were cases where humans excelled where AI faltered. The most reliable outcomes emerged from collective decisions involving multiple humans and multiple AI models—adding just one AI to a human group, or vice versa, significantly improved diagnostic accuracy.
Potential and Limitations
While the findings are promising, the study was limited to text-based case vignettes, not real patients in clinical settings. The researchers caution that the results may not directly translate to everyday practice, and further studies are needed to address real-world implementation and acceptance by medical staff and patients. The study also did not address treatment, noting that a correct diagnosis does not guarantee optimal care.
The research is part of the Hybrid Human Artificial Collective Intelligence in Open-Ended Decision Making (HACID) project, which aims to develop clinical decision-support systems that integrate human and machine intelligence. The approach has potential applications in regions with limited access to medical care and could be adapted for other high-stakes decision-making contexts, such as the legal system, disaster response, or climate policy.
Looking Ahead
“It’s not about replacing humans with machines. Rather, we should view artificial intelligence as a complementary tool that unfolds its full potential in collective decision-making,” says co-author Stefan Herzog, Senior Research Scientist at the Max Planck Institute for Human Development.
Disclaimer
This article is based on research published in the Proceedings of the National Academy of Sciences and summarized in a news release by Medical Xpress. The findings are preliminary and limited to simulated case vignettes. The results may not fully reflect real-world clinical outcomes, and AI should not be considered a substitute for professional medical advice or diagnosis. Always consult a qualified healthcare provider for medical concerns.