AI Models Rival Mental Health Professionals in Assessing Suicidal Ideation Responses, Study Finds

Spread the Message

Read Time:2 Minute, 15 Second

A new study by the RAND Corporation reveals that artificial intelligence models are demonstrating a remarkable ability to evaluate appropriate responses to individuals experiencing suicidal thoughts, sometimes even surpassing the performance of mental health professionals.

The research, published in the Journal of Medical Internet Research, assessed the knowledge of three major large language models (LLMs): ChatGPT by OpenAI, Claude by Anthropic, and Gemini by Google. Using the Suicidal Ideation Response Inventory (SIRI-2), a standard assessment tool, researchers presented the LLMs with 24 hypothetical scenarios involving individuals expressing depressive symptoms and suicidal ideation, followed by potential clinician responses.

The findings showed that all three LLMs exhibited a tendency to overrate the appropriateness of clinician responses, indicating a need for improved calibration. However, the overall performance of ChatGPT and Claude was comparable to that of professional counselors, nurses, and psychiatrists from previous studies. Notably, Claude demonstrated the strongest performance, even exceeding scores observed among mental health professionals who had recently completed suicide intervention skills training. Gemini’s performance was equivalent to that of K-12 school staff prior to suicide intervention training.

“In evaluating appropriate interactions with individuals expressing suicidal ideation, we found these large language models can be surprisingly discerning,” said Ryan McBain, the study’s lead author and a senior policy researcher at RAND. “However, the bias of these models to rate responses as more appropriate than they are—at least according to clinical experts—indicates they should be further improved.”

The study highlights the potential of LLMs in mental health applications, particularly in resource-limited communities. However, researchers emphasize the importance of safe design and rigorous testing, stressing that these AI models should not be considered replacements for crisis lines or professional care.

“Our goal is to help policymakers and tech developers recognize both the promise and the limitations of using large language models in mental health,” McBain said. “We are pressure testing a benchmark that could be used by tech platforms building mental health care, which would be especially impactful in communities that have limited resources. But caution is essential.”

The researchers recommend future studies to evaluate how AI tools directly respond to questions from individuals experiencing suicidal ideation or other mental health crises.

Disclaimer: This article reports on a study evaluating AI models’ ability to assess responses to suicidal ideation. It is crucial to understand that AI models are not a substitute for professional mental health care. If you or someone you know is experiencing suicidal thoughts, please seek help immediately. Contact a crisis hotline, mental health professional, or emergency services. This study highlights the potential of AI in mental health support but does not endorse its use as a replacement for human intervention in critical situations.