Fugu-MT 論文翻訳(概要): EgoFlow: Gradient-Guided Flow Matching for Egocentric 6DoF Object Motion Generation

論文の概要: EgoFlow: Gradient-Guided Flow Matching for Egocentric 6DoF Object Motion Generation

arxiv url: http://arxiv.org/abs/2604.01421v1
Date: Wed, 01 Apr 2026 21:43:57 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-03 14:21:10.071775
Title: EgoFlow: Gradient-Guided Flow Matching for Egocentric 6DoF Object Motion Generation
Title（参考訳）: EgoFlow:Egocentric 6DoFオブジェクトモーション生成のためのグラディエントガイドフローマッチング
Authors: Abhishek Saroha, Huajian Zeng, Xingxing Zuo, Daniel Cremers, Xi Wang,
Abstract要約: マルチモーダルなエゴセントリックな観測を前提とした,現実的かつ物理的に可視な軌道を合成するフローマッチングフレームワークであるEgoFlowを提案する。この結果は,スケーラブルで物理的に基盤付けられた自我中心の動作理解のためのフローベース生成モデリングの可能性を浮き彫りにした。
参考スコア（独自算出の注目度）: 47.32597153743819
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding and predicting object motion from egocentric video is fundamental to embodied perception and interaction. However, generating physically consistent 6DoF trajectories remains challenging due to occlusions, fast motion, and the lack of explicit physical reasoning in existing generative models. We present EgoFlow, a flow-matching framework that synthesizes realistic and physically plausible trajectories conditioned on multimodal egocentric observations. EgoFlow employs a hybrid Mamba-Transformer-Perceiver architecture to jointly model temporal dynamics, scene geometry, and semantic intent, while a gradient-guided inference process enforces differentiable physical constraints such as collision avoidance and motion smoothness. This combination yields coherent and controllable motion generation without post-hoc filtering or additional supervision. Experiments on real-world datasets HD-EPIC, EgoExo4D, and HOT3D show that EgoFlow outperforms diffusion-based and transformer baselines in accuracy, generalization, and physical realism, reducing collision rates by up to 79%, and strong generalization to unseen scenes. Our results highlight the promise of flow-based generative modeling for scalable and physically grounded egocentric motion understanding.
Abstract（参考訳）: エゴセントリックなビデオから物体の動きを理解し予測することは、知覚と相互作用の具体化に不可欠である。しかし、物理的に一貫した6DoF軌道の生成は、閉塞、高速運動、および既存の生成モデルにおける明示的な物理的推論の欠如により、依然として困難である。マルチモーダルなエゴセントリックな観測を前提とした,現実的かつ物理的に可視な軌道を合成するフローマッチングフレームワークであるEgoFlowを提案する。 EgoFlowは、ハイブリッドなMamba-Transformer-Perceiverアーキテクチャを使用して、時間的ダイナミクス、シーン幾何学、意味的意図を共同でモデル化する一方、勾配誘導推論プロセスは衝突回避や動きの滑らかさといった様々な物理的制約を強制する。この組み合わせは、ポストホックフィルタリングや追加の監督なしにコヒーレントかつ制御可能なモーション生成をもたらす。実世界のデータセットであるHD-EPIC、EgoExo4D、HOT3Dの実験では、EgoFlowは拡散ベースおよびトランスフォーマーベースラインを精度、一般化、物理的リアリズムで上回り、衝突速度を最大79%削減し、見えないシーンへの強力な一般化を実現している。この結果は,スケーラブルで物理的に基盤付けられた自我中心の動作理解のためのフローベース生成モデリングの可能性を浮き彫りにした。

論文の概要: EgoFlow: Gradient-Guided Flow Matching for Egocentric 6DoF Object Motion Generation

関連論文リスト