Key Points
- RAND study examined ChatGPT, Claude and Gemini using 30 suicide‑related questions run 100 times each.
- ChatGPT and Claude performed well on very low‑risk and very high‑risk queries, providing appropriate or non‑harmful answers.
- Gemini showed more variable responses across risk categories.
- All three models were inconsistent on intermediate‑risk questions, sometimes offering safe guidance and other times giving no response.
- ChatGPT and Claude occasionally named high‑risk poisons, while Gemini less often gave direct high‑risk answers but sometimes omitted factual low‑risk information.
- ChatGPT frequently declined to provide therapeutic resources or direct users to mental‑health support.
- Study highlights gaps in AI safety for mental‑health discussions and calls for stronger safeguards.
Study Overview
The RAND Corporation conducted a focused investigation into how three popular large‑language model chatbots—ChatGPT, Claude and Gemini—respond to queries about suicide. Researchers aimed to gauge the safety and reliability of these systems when faced with questions that could range from general information‑seeking to highly dangerous requests.
Methodology
The team compiled a set of thirty suicide‑related prompts, each classified by expert clinicians into low‑risk, general‑information, or high‑risk categories. Each chatbot was asked the same set of questions one hundred times, allowing the researchers to assess consistency and content across repeated interactions.
Key Findings
For very low‑risk questions, ChatGPT and Claude tended to generate responses deemed appropriate by clinicians, while Gemini showed more variability. When confronted with very high‑risk prompts, both ChatGPT and Claude generally avoided providing direct instructions for lethal methods, though they occasionally named poisons associated with high suicide completion rates. Gemini was less likely to give direct answers to high‑risk queries but also sometimes failed to respond to factual low‑risk questions.
Intermediate‑risk questions—such as requests for recommendations for someone experiencing suicidal thoughts—produced the most inconsistency across all three platforms. At times, the chatbots supplied safe, offering resources and gentle advice; at other times, they either did not answer or provided less helpful information.
The study also noted a reluctance, particularly from ChatGPT, to supply therapeutic resources or direct users toward mental‑health support, with many responses declining to address the request directly.
Implications and Recommendations
Researchers concluded that while the chatbots align with expert assessments for the extremes of risk, significant variability exists for intermediate scenarios and between different chatbot providers. This inconsistency underscores the need for robust safeguards, clearer guidelines, and improved safety mechanisms when AI systems are used to discuss sensitive mental‑health topics.
Source: cnet.com