Fugu-MT 論文翻訳(概要): Does Theory of Mind Improvement Really Benefit Human-AI Interactions? Empirical Findings from Interactive Evaluations

論文の概要: Does Theory of Mind Improvement Really Benefit Human-AI Interactions? Empirical Findings from Interactive Evaluations

arxiv url: http://arxiv.org/abs/2605.15205v1
Date: Tue, 28 Apr 2026 15:38:31 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-25 12:34:33.824355
Title: Does Theory of Mind Improvement Really Benefit Human-AI Interactions? Empirical Findings from Interactive Evaluations
Title（参考訳）: 心的改善の理論は人間とAIの相互作用に本当に相応しいか? : 対話的評価による実証的考察
Authors: Nanxu Gong, Zixin Chen, Haotian Li, Zishu Zhao, Jianxun Lian, Huamin Qu, Yanjie Fu, Xing Xie,
Abstract要約: 既存のベンチマークは、ストーリーを読むことでToMの能力を改善する。本稿では,視点と距離シフトを両立させた対話型ToM評価のパラダイムを提案する。この結果から,静的ベンチマークの改善は動的HAIインタラクションの性能向上に必ずしも寄与しないことがわかった。
参考スコア（独自算出の注目度）: 80.39504840645687
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Improving the Theory of Mind (ToM) capability of Large Language Models (LLMs) is crucial for effective social interactions between these AI models and humans. However, the existing benchmarks often measure ToM capability improvement through story-reading, multiple-choice questions from a third-person perspective, while ignoring the first-person, dynamic, and open-ended nature of human-AI (HAI) interactions. To directly examine how ToM improvement techniques benefit HAI interactions, we first proposed the new paradigm of interactive ToM evaluation with both perspective and metric shifts. Next, following the paradigm, we conducted a systematic study of four representative ToM enhancement techniques using both four real-world datasets and a user study, covering both goal-oriented tasks (e.g., coding, math) and experience-oriented tasks (e.g., counseling). Our findings reveal that improvements on static benchmarks do not always translate to better performance in dynamic HAI interactions. This paper offers critical insights into ToM evaluation, showing the necessity of interaction-based assessments in developing next-generation, socially aware LLMs for HAI symbiosis.
Abstract（参考訳）: 大規模言語モデル(LLM)の心の理論(ToM)能力の向上は、これらのAIモデルと人間の間の効果的な社会的相互作用に不可欠である。しかし、既存のベンチマークは、ストーリーを読むことによるToM能力の向上を第三者の観点から評価する一方で、人間-AI(HAI)インタラクションのファーストパーソン、ダイナミック、オープンエンドの性質を無視していることが多い。 ToM改善技術がHAIインタラクションにどのように役立つかを直接検討するために、まず、視点とメートル法の両方でインタラクティブなToM評価のパラダイムを提案する。次に、4つの実世界のデータセットとユーザスタディの両方を用いて、4つのToM強化手法を体系的に研究し、目標指向タスク(例えば、コーディング、数学)と経験指向タスク(例えば、カウンセリング)の両方をカバーした。この結果から,静的ベンチマークの改善は動的HAIインタラクションの性能向上に必ずしも寄与しないことがわかった。本稿では,HAI共生のための次世代LLMの開発における相互作用に基づく評価の必要性について,ToM評価に関する重要な知見を提供する。

論文の概要: Does Theory of Mind Improvement Really Benefit Human-AI Interactions? Empirical Findings from Interactive Evaluations

関連論文リスト