Fugu-MT 論文翻訳(概要): IOI: Decoupling Kinematics and Physics for Interactive World Models

論文の概要: IOI: Decoupling Kinematics and Physics for Interactive World Models

arxiv url: http://arxiv.org/abs/2606.23296v1
Date: Mon, 22 Jun 2026 13:09:34 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-24 22:44:18.775606
Title: IOI: Decoupling Kinematics and Physics for Interactive World Models
Title（参考訳）: IOI:対話型世界モデルのためのキネマティクスと物理の分離
Authors: Chengyu Bai, Peidong Jia, Tiecheng Guo, Yukai Wang, Rui Ma, Fangyuan Zhao, Chunkai Fan, Xiaobao Wei, Jintao Chen, Hao Wang, Ying Li, Xiaozhu Ju, Jian Tang, Shanghang Zhang,
Abstract要約: 我々は,キネマティクスと学習物理力学を統合したハイブリッド対話型世界モデルIOIを提案する。 IOIは、運動軌跡を正確に計算するための明示的なキネマティックガイダンス、キネマティックスシーケンスを導入している。 RoboTwinベンチマークの実験は、キネマティックな忠実さ、アウト・オブ・ディストリビューション、およびポリシー評価を越えてIOIを検証する。
参考スコア（独自算出の注目度）: 46.3330122411516
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Developing generalist embodied agents requires interactive environments providing visually realistic feedback and accurate action-conditioned dynamics. Interactive world models address this by simulating such complex dynamics. However, purely data-driven methods struggle to ensure precise control alignment and physically plausible visual feedback due to a lack of explicit structural constraints. To address this, we propose IOI, a hybrid interactive world model integrating analytical kinematic priors with learned physical dynamics. Unlike data-driven approaches prone to spatiotemporal drift, IOI introduces explicit kinematic guidance, computing forward kinematics from action sequences for accurate motion trajectories. These trajectories are rendered into synchronized front, side, and top orthographic projections, eliminating the need for extrinsic camera calibration. A Multi-view Kinematic Aggregation and Injection module fuses these geometric cues and injects them into the video generator, providing geometry-consistent guidance. Conditioning video generation on these deterministic trajectories establishes a synergy between the analytical simulator and the world model. Decoupling deterministic motion into the kinematic prior frees the generator to model stochastic physical interactions. Experiments on the RoboTwin benchmark validate IOI across kinematic fidelity, out-of-distribution (OOD) generalization, and policy evaluation. IOI achieves state-of-the-art simulation performance and robust zero-shot generalization to unseen OOD tasks. Furthermore, IOI serves as a reliable policy evaluator, yielding success rates closely aligning with ground-truth physics simulators. On real-world platforms, policies trained on IOI-synthesized data match those trained on teleoperation demonstrations, solidifying its practical value for embodied policy learning.
Abstract（参考訳）: 汎用的なエンボディエージェントの開発には、視覚的にリアルなフィードバックと正確な動作条件のダイナミクスを提供する対話環境が必要である。インタラクティブな世界モデルは、このような複雑な力学をシミュレートすることでこの問題に対処する。しかし、純粋にデータ駆動手法は、明確な構造的制約が欠如しているため、正確な制御アライメントと物理的に妥当な視覚フィードバックを確保するのに苦労する。そこで本研究では,解析的キネマティック先行と学習物理力学を融合した対話型世界モデルIOIを提案する。データ駆動型アプローチは時空間ドリフトの傾向が強いのとは異なり、IOIは運動の正確な軌跡に対するアクションシーケンスから運動の前方運動学を演算する明示的なキネマティックガイダンスを導入している。これらのトラジェクトリは、同期された前面、側面、および上部の正射影にレンダリングされ、外部カメラキャリブレーションの必要がなくなる。 Multi-view Kinematic Aggregation and Injectionモジュールはこれらの幾何学的手がかりを融合させ、ビデオジェネレータに注入し、幾何学的に一貫性のあるガイダンスを提供する。これらの決定論的軌道上の条件付きビデオ生成は、解析シミュレータと世界モデルとの相乗関係を確立する。決定論的運動をキネマティックな先行運動に分解することで、生成元は確率的物理的相互作用をモデル化する。 RoboTwinベンチマークの実験は、動画像の忠実度、配布外一般化(OOD)、およびポリシー評価にまたがるIOIを検証する。 IOIは、未確認のOODタスクに対して、最先端のシミュレーション性能と堅牢なゼロショット一般化を実現する。さらに、IOIは信頼性の高い政策評価器として機能し、地上の物理シミュレータと密に一致した成功率を得る。現実世界のプラットフォームでは、IOI合成データに基づいてトレーニングされたポリシーは、遠隔操作のデモでトレーニングされたポリシーと一致し、その実践的な価値を具体化している。

論文の概要: IOI: Decoupling Kinematics and Physics for Interactive World Models

関連論文リスト