Fugu-MT 論文翻訳(概要): Grounding Social Perception in Intuitive Physics

論文の概要: Grounding Social Perception in Intuitive Physics

arxiv url: http://arxiv.org/abs/2603.27410v1
Date: Sat, 28 Mar 2026 21:14:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-31 23:18:44.943691
Title: Grounding Social Perception in Intuitive Physics
Title（参考訳）: 直観物理学における接地社会的知覚
Authors: Lance Ying, Aydan Y. Huang, Aviv Netanyahu, Andrei Barbu, Boris Katz, Joshua B. Tenenbaum, Tianmin Shu,
Abstract要約: 本稿では,計画,確率的計画,物理シミュレーションを統合し,エージェントの目標と関係を軌道から推測するモデルを提案する。以上の結果から,本モデルが社会的場面の物理的理解の方法の計算的説明を提供する可能性が示唆された。
参考スコア（独自算出の注目度）: 55.69427097024444
License: http://creativecommons.org/licenses/by/4.0/
Abstract: People infer rich social information from others' actions. These inferences are often constrained by the physical world: what agents can do, what obstacles permit, and how the physical actions of agents causally change an environment and other agents' mental states and behavior. We propose that such rich social perception is more than visual pattern matching, but rather a reasoning process grounded in an integration of intuitive psychology with intuitive physics. To test this hypothesis, we introduced PHASE (PHysically grounded Abstract Social Events), a large dataset of procedurally generated animations, depicting physically simulated two-agent interactions on a 2D surface. Each animation follows the style of the Heider and Simmel movie, with systematic variation in environment geometry, object dynamics, agent capacities, goals, and relationships (friendly/adversarial/neutral). We then present a computational model, SIMPLE, a physics-grounded Bayesian inverse planning model that integrates planning, probabilistic planning, and physics simulation to infer agents' goals and relations from their trajectories. Our experimental results showed that SIMPLE achieved high accuracy and agreement with human judgments across diverse scenarios, while feedforward baseline models -- including strong vision-language models -- and physics-agnostic inverse planning failed to achieve human-level performance and did not align with human judgments. These results suggest that our model provides a computational account for how people understand physically grounded social scenes by inverting a generative model of physics and agents.
Abstract（参考訳）: 人々は他人の行動から豊かな社会的情報を推測します。これらの推論は、エージェントができること、障害が許すこと、エージェントの物理的行動が環境や他のエージェントの精神状態や行動をどのように因果的に変えるか、といった物理的な世界によって制約されることが多い。このような豊かな社会的知覚は、視覚的なパターンマッチング以上のものではなく、直感的な心理学と直感的な物理学の統合に基づく推論プロセスである、と提案する。この仮説を検証するために、我々は2次元表面上の物理的にシミュレートされた2エージェント相互作用を描写した、手続き的に生成された巨大なアニメーションのデータセットPHASE(PHysically grounded Abstract Social Events)を導入した。それぞれのアニメーションは、環境幾何学、オブジェクトダイナミクス、エージェント能力、目標、関係性(友好的/敵対的/中立的)の体系的なバリエーションを持つハイダー・アンド・シンメル映画(英語版)のスタイルに従っている。次に,計画,確率的計画,物理シミュレーションを統合し,エージェントの目標と関係を軌道から推測する物理基底ベイズ逆計画モデルであるSIMPLEを提案する。実験の結果、SIMPLEは様々なシナリオで高い精度と人間の判断と一致し、フィードフォワードベースラインモデル(強力な視覚言語モデルを含む)と物理に依存しない逆計画は人間レベルの性能を達成できず、人間の判断と一致しなかった。これらの結果から,本モデルは,物理とエージェントの生成モデルを逆転させることで,社会的場面を物理的に理解する方法の計算的説明を提供する可能性が示唆された。

論文の概要: Grounding Social Perception in Intuitive Physics

関連論文リスト