Fugu-MT 論文翻訳(概要): S$^3$IT: A Benchmark for Spatially Situated Social Intelligence Test

論文の概要: S$^3$IT: A Benchmark for Spatially Situated Social Intelligence Test

arxiv url: http://arxiv.org/abs/2512.19992v1
Date: Tue, 23 Dec 2025 02:36:56 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-24 19:17:49.718635
Title: S$^3$IT: A Benchmark for Spatially Situated Social Intelligence Test
Title（参考訳）: S$3$IT: 空間的ソーシャルインテリジェンステストのためのベンチマーク
Authors: Zhe Sun, Xueyuan Yang, Yujie Lu, Zhenliang Zhang,
Abstract要約: 本稿では,具体的ソーシャルインテリジェンスを評価するために特別に設計されたベンチマークである空間決定型ソーシャルインテリジェンステスト(S$3$IT)を紹介する。エージェントは大規模な言語モデル駆動NPCのグループのために3D環境に座席を配置する必要がある。我々のフレームワークは、制御し難い広い多様なシナリオ空間を生成し、エージェントに活発な対話を通して好みを取得し、自律的な探索を通して環境を知覚し、複雑な制約ネットワーク内で多目的最適化を行うよう促す。
参考スコア（独自算出の注目度）: 26.79990069295221
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The integration of embodied agents into human environments demands embodied social intelligence: reasoning over both social norms and physical constraints. However, existing evaluations fail to address this integration, as they are limited to either disembodied social reasoning (e.g., in text) or socially-agnostic physical tasks. Both approaches fail to assess an agent's ability to integrate and trade off both physical and social constraints within a realistic, embodied context. To address this challenge, we introduce Spatially Situated Social Intelligence Test (S$^{3}$IT), a benchmark specifically designed to evaluate embodied social intelligence. It is centered on a novel and challenging seat-ordering task, requiring an agent to arrange seating in a 3D environment for a group of large language model-driven (LLM-driven) NPCs with diverse identities, preferences, and intricate interpersonal relationships. Our procedurally extensible framework generates a vast and diverse scenario space with controllable difficulty, compelling the agent to acquire preferences through active dialogue, perceive the environment via autonomous exploration, and perform multi-objective optimization within a complex constraint network. We evaluate state-of-the-art LLMs on S$^{3}$IT and found that they still struggle with this problem, showing an obvious gap compared with the human baseline. Results imply that LLMs have deficiencies in spatial intelligence, yet simultaneously demonstrate their ability to achieve near human-level competence in resolving conflicts that possess explicit textual cues.
Abstract（参考訳）: 人間の環境へのエンボディエージェントの統合は、社会的規範と物理的制約の両方を推論する、エンボディエージェントの社会知性を要求する。しかし、既存の評価では、社会的推論(例えば、テキスト)や社会的に無知な物理的タスクに制限されているため、この統合に対処できない。どちらのアプローチも、現実的で具体化されたコンテキストの中で、身体的および社会的制約を統合およびトレードオフするエージェントの能力を評価するのに失敗する。この課題に対処するために,具体的ソーシャルインテリジェンスを評価するためのベンチマークであるS$^{3}$ITを導入する。エージェントは、多種多様なアイデンティティ、好み、複雑な対人関係を持つ大規模言語モデル駆動(LLM駆動)NPCの3D環境に座席を配置する必要がある。我々の手続き的に拡張可能なフレームワークは、制御し難い広い多様なシナリオ空間を生成し、エージェントに活発な対話を通じて好みを取得し、自律的な探索を通して環境を知覚し、複雑な制約ネットワーク内で多目的最適化を行うように促す。我々は、S$^{3}$ITの最先端LCMを評価し、それらがまだこの問題に取り組んでおり、人間のベースラインと比較して明らかなギャップがあることを発見した。その結果、LLMは空間知能に欠けるが、明示的なテキストの手がかりを持つ対立を解決する上で、人間レベルの能力に近い能力を同時に発揮できることが示唆された。

論文の概要: S$^3$IT: A Benchmark for Spatially Situated Social Intelligence Test

関連論文リスト