Fugu-MT 論文翻訳(概要): Between Help and Harm: An Evaluation of Mental Health Crisis Handling by LLMs

論文の概要: Between Help and Harm: An Evaluation of Mental Health Crisis Handling by LLMs

arxiv url: http://arxiv.org/abs/2509.24857v1
Date: Mon, 29 Sep 2025 14:42:23 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:20.053469
Title: Between Help and Harm: An Evaluation of Mental Health Crisis Handling by LLMs
Title（参考訳）: ヘルプとハーム : LLMによるメンタルヘルス危機処理の評価
Authors: Adrian Arnaiz-Rodriguez, Miguel Baidal, Erik Derner, Jenn Layton Annable, Mark Ball, Mark Ince, Elvira Perez Vallejos, Nuria Oliver,
Abstract要約: 臨床的にインフォームドされた6つのメンタルヘルス危機カテゴリーの統一分類を導入する。我々は、危機タイプを分類し、安全で適切な応答を生成する能力のために、3つの最先端のLCMをベンチマークする。間接的または曖昧なリスク信号の処理におけるシステム的弱点、定式的および不完全なデフォルト応答への依存、およびユーザコンテキストとの頻繁な不一致を識別する。
参考スコア（独自算出の注目度）: 6.0460961868478975
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: The widespread use of chatbots powered by large language models (LLMs) such as ChatGPT and Llama has fundamentally reshaped how people seek information and advice across domains. Increasingly, these chatbots are being used in high-stakes contexts, including emotional support and mental health concerns. While LLMs can offer scalable support, their ability to safely detect and respond to acute mental health crises remains poorly understood. Progress is hampered by the absence of unified crisis taxonomies, robust annotated benchmarks, and empirical evaluations grounded in clinical best practices. In this work, we address these gaps by introducing a unified taxonomy of six clinically-informed mental health crisis categories, curating a diverse evaluation dataset, and establishing an expert-designed protocol for assessing response appropriateness. We systematically benchmark three state-of-the-art LLMs for their ability to classify crisis types and generate safe, appropriate responses. The results reveal that while LLMs are highly consistent and generally reliable in addressing explicit crisis disclosures, significant risks remain. A non-negligible proportion of responses are rated as inappropriate or harmful, with responses generated by an open-weight model exhibiting higher failure rates than those generated by the commercial ones. We also identify systemic weaknesses in handling indirect or ambiguous risk signals, a reliance on formulaic and inauthentic default replies, and frequent misalignment with user context. These findings underscore the urgent need for enhanced safeguards, improved crisis detection, and context-aware interventions in LLM deployments. Our taxonomy, datasets, and evaluation framework lay the groundwork for ongoing research and responsible innovation in AI-driven mental health support, helping to minimize harm and better protect vulnerable users.
Abstract（参考訳）: ChatGPTやLlamaのような大規模言語モデル(LLM)を利用したチャットボットの普及は、人々がドメイン間で情報やアドバイスを求める方法を根本的に変えてきた。これらのチャットボットは、感情的なサポートやメンタルヘルスの懸念など、ハイテイクな文脈での利用が増えている。 LLMはスケーラブルなサポートを提供することができるが、急性のメンタルヘルス危機を安全に検出し、応答する能力は、まだ十分に理解されていない。進歩は、統合危機分類の欠如、堅牢な注釈付きベンチマーク、そして臨床のベストプラクティスに基づく経験的評価によって妨げられている。本研究は,6つの臨床的にインフォームドされたメンタルヘルス危機カテゴリーの統一分類を導入し,多様な評価データセットをキュレートし,応答適性を評価するための専門家設計のプロトコルを確立することにより,これらのギャップに対処する。我々は、危機タイプを分類し、安全で適切な応答を生成するために、3つの最先端のLCMを体系的にベンチマークする。その結果、LSMは極めて一貫性があり、明示的な危機開示に対処する上で概して信頼性が高いが、重大なリスクは残ることが明らかとなった。非無視的な応答の割合は不適切または有害であると評価され、オープンウェイトモデルによって生成された応答は、商用の応答よりも高い失敗率を示す。また、間接的または曖昧なリスク信号の処理におけるシステム的弱点、定式的および不完全なデフォルト応答への依存、およびユーザコンテキストとの頻繁な不一致を識別する。これらの知見は, LLMの展開において, 安全対策の強化, 危機検出の改善, コンテキスト認識の介入の必要性を浮き彫りにした。私たちの分類学、データセット、評価フレームワークは、AIによるメンタルヘルスサポートにおける継続的な研究と責任あるイノベーションの基礎を成し、害を最小限に抑え、脆弱なユーザーをよりよく保護するのに役立つ。

論文の概要: Between Help and Harm: An Evaluation of Mental Health Crisis Handling by LLMs

関連論文リスト