Fugu-MT 論文翻訳(概要): Unified Multi-Modal Interactive & Reactive 3D Motion Generation via Rectified Flow

論文の概要: Unified Multi-Modal Interactive & Reactive 3D Motion Generation via Rectified Flow

arxiv url: http://arxiv.org/abs/2509.24099v1
Date: Sun, 28 Sep 2025 22:36:18 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.630923
Title: Unified Multi-Modal Interactive & Reactive 3D Motion Generation via Rectified Flow
Title（参考訳）: 整流による多モードインタラクティブ・リアクティブ3次元運動生成
Authors: Prerit Gupta, Shourya Verma, Ananth Grama, Aniket Bera,
Abstract要約: マルチモーダルな2人動作生成のためのフレームワークであるDualFlowを紹介する。テキスト、音楽、先行の動作シーケンスを含む様々な入力で動きを合成する。時間的にコヒーレントでリズミカルに同期した動作を生成し、マルチモーダルな人間の動作生成において最先端の動作を設定する。
参考スコア（独自算出の注目度）: 17.95248351806955
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generating realistic, context-aware two-person motion conditioned on diverse modalities remains a central challenge in computer graphics, animation, and human-computer interaction. We introduce DualFlow, a unified and efficient framework for multi-modal two-person motion generation. DualFlow conditions 3D motion synthesis on diverse inputs, including text, music, and prior motion sequences. Leveraging rectified flow, it achieves deterministic straight-line sampling paths between noise and data, reducing inference time and mitigating error accumulation common in diffusion-based models. To enhance semantic grounding, DualFlow employs a Retrieval-Augmented Generation (RAG) module that retrieves motion exemplars using music features and LLM-based text decompositions of spatial relations, body movements, and rhythmic patterns. We use contrastive objective that further strengthens alignment with conditioning signals and introduce synchronization loss that improves inter-person coordination. Extensive evaluations across text-to-motion, music-to-motion, and multi-modal interactive benchmarks show consistent gains in motion quality, responsiveness, and efficiency. DualFlow produces temporally coherent and rhythmically synchronized motions, setting state-of-the-art in multi-modal human motion generation.
Abstract（参考訳）: 多様なモダリティを条件とした現実的でコンテキスト対応の2人の動作を生成することは、コンピュータグラフィックス、アニメーション、人間とコンピュータの相互作用において、依然として中心的な課題である。マルチモーダルな2人動作生成のための統合的で効率的なフレームワークであるDualFlowを紹介する。 DualFlowはテキスト、音楽、先行モーションシーケンスを含む多様な入力に対して3Dモーション合成を行う。修正流れを利用して、ノイズとデータ間の決定論的直線サンプリングパスを達成し、推論時間を短縮し、拡散ベースモデルに共通する誤差の蓄積を緩和する。セマンティックグラウンド化を強化するため、DualFlowでは、音楽の特徴とLLMに基づく空間関係、身体運動、リズムパターンのテキスト分解を用いて、動きの例を検索するRetrieval-Augmented Generation (RAG)モジュールを採用している。我々は、コンディショニング信号とのアライメントをさらに強化し、対人協調を改善する同期損失を導入することを目的とした。テキスト・ツー・モーション、音楽・ツー・モーション、マルチモーダル・インタラクティブ・ベンチマークにおける広範囲な評価は、動きの質、応答性、効率性が一貫した向上を示す。 DualFlowは、時間的にコヒーレントでリズミカルに同期された動作を生成し、マルチモーダルな人間のモーション生成において最先端の動作を設定する。

論文の概要: Unified Multi-Modal Interactive & Reactive 3D Motion Generation via Rectified Flow

関連論文リスト