0 0
Read Time:2 Minute, 11 Second

 

In a world-first study, researchers from CSIRO and The University of Queensland (UQ) have uncovered alarming findings regarding the reliability of ChatGPT, a popular large language model (LLM), when it comes to health-related inquiries. The study, presented at the Empirical Methods in Natural Language Processing (EMNLP) conference, highlights potential risks associated with relying on online tools like ChatGPT for crucial health information.

The research explored how ChatGPT responds to health-related questions when provided with evidence, revealing a concerning trend: the more evidence given, the less reliable ChatGPT becomes. The accuracy of its responses plummeted to as low as 28 percent when presented with supporting or contrary evidence, compared to an 80 percent accuracy in a question-only format.

Dr. Bevan Koopman, Principal Research Scientist at CSIRO and Associate Professor at UQ, expressed the urgency of understanding and addressing the risks associated with seeking health information online. “Despite the documented risks of searching for health information online, many individuals continue to turn to tools like ChatGPT for answers,” Dr. Koopman emphasized. “It’s imperative that we inform the public about these risks and work to improve the accuracy of health-related responses.”

The study, which evaluated 100 health-related questions, ranging from inquiries about common cold remedies to the effects of vinegar on fish bone removal, sheds light on the limitations of ChatGPT in providing reliable health information. Surprisingly, even when presented with evidence, ChatGPT’s accuracy declined, challenging the belief that evidence-based prompts enhance accuracy.

“We’re still investigating the reasons behind this phenomenon,” Dr. Koopman remarked. “However, it’s evident that the integration of evidence, whether correct or not, introduces noise that hampers ChatGPT’s ability to provide accurate responses.”

ChatGPT, launched in late 2022, has rapidly gained popularity as one of the most utilized LLMs, capable of various language processing tasks. However, concerns over its reliability, particularly in the health domain, underscore the need for further research and public awareness.

Professor Guido Zuccon, co-author of the study and Director of AI for the Queensland Digital Health Centre (QDHeC), emphasized the importance of understanding the complexities of LLM interactions. “Major search engines are integrating LLMs and search technologies, but we must address the challenges in controlling and optimizing these interactions to ensure the generation of accurate health information,” Professor Zuccon stated.

As the research progresses, the team plans to delve deeper into how the public utilizes health information generated by LLMs, aiming to mitigate potential risks and enhance the reliability of health-related responses. In a digital age where access to information is abundant, ensuring the accuracy of health-related content remains a critical endeavor.

Happy
Happy
0 %
Sad
Sad
0 %
Excited
Excited
0 %
Sleepy
Sleepy
0 %
Angry
Angry
0 %
Surprise
Surprise
0 %