Fugu-MT 論文翻訳(概要): Motion Forcing: A Decoupled Framework for Robust Video Generation in Motion Dynamics

論文の概要: Motion Forcing: A Decoupled Framework for Robust Video Generation in Motion Dynamics

arxiv url: http://arxiv.org/abs/2603.10408v1
Date: Wed, 11 Mar 2026 04:44:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-12 16:22:32.782027
Title: Motion Forcing: A Decoupled Framework for Robust Video Generation in Motion Dynamics
Title（参考訳）: Motion Forcing:モーションダイナミクスにおけるロバストビデオ生成のための分離されたフレームワーク
Authors: Tianshuo Xu, Zhifei Chen, Leyi Wu, Hao Lu, Ying-cong Chen,
Abstract要約: ビデオ生成平衡の安定化を目的としたフレームワークである textbfMotion Forcing を導入する。我々の重要な洞察は、視覚合成から物理的推論を明確に分離することである。自動運転ベンチマークの実験によると、Motion Forcingは最先端のベースラインを大幅に上回っている。
参考スコア（独自算出の注目度）: 37.22501359080204
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The ultimate goal of video generation is to satisfy a fundamental trilemma: achieving high visual quality, maintaining rigorous physical consistency, and enabling precise controllability. While recent models can maintain this balance in simple, isolated scenarios, we observe that this equilibrium is fragile and often breaks down as scene complexity increases (e.g., involving collisions or dense traffic). To address this, we introduce \textbf{Motion Forcing}, a framework designed to stabilize this trilemma even in complex generative tasks. Our key insight is to explicitly decouple physical reasoning from visual synthesis via a hierarchical \textbf{``Point-Shape-Appearance''} paradigm. This approach decomposes generation into verifiable stages: modeling complex dynamics as sparse geometric anchors (\textbf{Point}), expanding them into dynamic depth maps that explicitly resolve 3D geometry (\textbf{Shape}), and finally rendering high-fidelity textures (\textbf{Appearance}). Furthermore, to foster robust physical understanding, we employ a \textbf{Masked Point Recovery} strategy. By randomly masking input anchors during training and enforcing the reconstruction of complete dynamic depth, the model is compelled to move beyond passive pattern matching and learn latent physical laws (e.g., inertia) to infer missing trajectories. Extensive experiments on autonomous driving benchmarks show that Motion Forcing significantly outperforms state-of-the-art baselines, maintaining trilemma stability across complex scenes. Evaluations on physics and robotics further confirm our framework's generality.
Abstract（参考訳）: ビデオ生成の最終的な目標は、高い視覚的品質を達成し、厳密な物理的一貫性を維持し、正確な制御性を実現するという、基本的なトリレンマを満たすことである。最近のモデルは、単純で孤立したシナリオでこのバランスを維持することができるが、この平衡は脆弱であり、シーンの複雑さが増加するにつれてしばしば崩壊する(例えば衝突や密集したトラフィックを含む)。これを解決するために、複雑な生成タスクにおいてもこのトリレンマを安定化させるように設計されたフレームワークである \textbf{Motion Forcing} を導入する。我々の重要な洞察は、階層的 \textbf{``Point-Shape-Appearance'' パラダイムを通じて、視覚合成から物理的推論を明示的に分離することである。このアプローチは生成を検証可能な段階に分解する: 複素力学をスパース幾何学的アンカーとしてモデル化し(\textbf{Point})、3次元幾何学を明確に解決する動的深さ写像に拡張し(\textbf{Shape})、最終的に高忠実なテクスチャをレンダリングする(\textbf{Appearance})。さらに、ロバストな物理的理解を促進するため、我々は \textbf{Masked Point Recovery} 戦略を採用している。トレーニング中に入力アンカーをランダムにマスキングし、完全な動的深さの再構築を強制することにより、モデルは受動的パターンマッチングを超えて、欠落した軌跡を推測するために潜在物理法則(例えば慣性)を学習する。自律走行ベンチマークの大規模な実験は、モーションフォースが最先端のベースラインを大幅に上回り、複雑な場面でトリレンマ安定性を維持することを示している。物理学とロボティクスの評価は、我々のフレームワークの一般性をさらに確認する。

論文の概要: Motion Forcing: A Decoupled Framework for Robust Video Generation in Motion Dynamics

関連論文リスト