Fugu-MT 論文翻訳(概要): Social-R1: Towards Human-like Social Reasoning in LLMs

論文の概要: Social-R1: Towards Human-like Social Reasoning in LLMs

arxiv url: http://arxiv.org/abs/2603.09249v1
Date: Tue, 10 Mar 2026 06:26:24 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-11 15:25:24.099113
Title: Social-R1: Towards Human-like Social Reasoning in LLMs
Title（参考訳）: ソーシャルR1:LLMにおけるヒューマンライクなソーシャル推論を目指して
Authors: Jincenzi Wu, Yuxuan Lei, Jianxun Lian, Yitian Huang, Lexin Zhou, Haotian Li, Xing Xie, Helen Meng,
Abstract要約: 我々は、人間のような社会知性を育むためには、ショートカットソリューションに抵抗する挑戦的なケースでのトレーニングが必要であると論じる。モデル推論と人間の認知を多次元報酬で整合させる強化学習フレームワークSocial-R1を提案する。
参考スコア（独自算出の注目度）: 74.32494331695837
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While large language models demonstrate remarkable capabilities across numerous domains, social intelligence - the capacity to perceive social cues, infer mental states, and generate appropriate responses - remains a critical challenge, particularly for enabling effective human-AI collaboration and developing AI that truly serves human needs. Current models often rely on superficial patterns rather than genuine social reasoning. We argue that cultivating human-like social intelligence requires training with challenging cases that resist shortcut solutions. To this end, we introduce ToMBench-Hard, an adversarial benchmark designed to provide hard training examples for social reasoning. Building on this, we propose Social-R1, a reinforcement learning framework that aligns model reasoning with human cognition through multi-dimensional rewards. Unlike outcome-based RL, Social-R1 supervises the entire reasoning process, enforcing structural alignment, logical integrity, and information density. Results show that our approach enables a 4B parameter model to surpass much larger counterparts and generalize robustly across eight diverse benchmarks. These findings demonstrate that challenging training cases with trajectory-level alignment offer a path toward efficient and reliable social intelligence.
Abstract（参考訳）: 大きな言語モデルは、多くのドメインで顕著な能力を示しているが、社会的知性 — 社会的手がかりを知覚し、メンタルステートを推測し、適切なレスポンスを生成する能力 — は、特に人間とAIの効果的なコラボレーションの実現と、本当に人間のニーズを満たすAIの開発において重要な課題である。現在のモデルは、真の社会的推論よりも表面的なパターンに依存していることが多い。我々は、人間のような社会知性を育むためには、ショートカットソリューションに抵抗する挑戦的なケースでのトレーニングが必要であると論じる。この目的のために、社会推論のためのハードトレーニング例を提供するために設計された逆ベンチマークであるToMBench-Hardを紹介する。そこで本研究では,モデル推論と人間の認識を多次元の報酬によって整合させる強化学習フレームワークSocial-R1を提案する。結果に基づくRLとは異なり、Social-R1は、構造的整合性、論理的整合性、情報密度を強制して、推論プロセス全体を監督する。その結果,提案手法により,より大きなパラメータを超越した4Bパラメータモデルを実現し,8つのベンチマークで頑健に一般化できることが示唆された。これらの結果から,軌道レベルの整合性を有する困難な訓練症例は,効率的かつ信頼性の高い社会知性への道筋を示すことが示唆された。

論文の概要: Social-R1: Towards Human-like Social Reasoning in LLMs

関連論文リスト