Fugu-MT 論文翻訳(概要): MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation

論文の概要: MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation

arxiv url: http://arxiv.org/abs/2509.26391v1
Date: Tue, 30 Sep 2025 15:26:04 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-01 14:45:00.182133
Title: MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation
Title（参考訳）: MotionRAG:モーション検索による画像合成
Authors: Chenhui Zhu, Yilu Wu, Shuai Wang, Gangshan Wu, Limin Wang,
Abstract要約: MotionRAGは、関連する参照ビデオから動きの先行を適応させることで、モーションリアリズムを強化する検索強化フレームワークである。提案手法は,複数のドメインや様々なベースモデルに対して,推論時の計算オーバーヘッドを無視できるような大幅な改善を実現している。
参考スコア（独自算出の注目度）: 44.524568858995586
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image-to-video generation has made remarkable progress with the advancements in diffusion models, yet generating videos with realistic motion remains highly challenging. This difficulty arises from the complexity of accurately modeling motion, which involves capturing physical constraints, object interactions, and domain-specific dynamics that are not easily generalized across diverse scenarios. To address this, we propose MotionRAG, a retrieval-augmented framework that enhances motion realism by adapting motion priors from relevant reference videos through Context-Aware Motion Adaptation (CAMA). The key technical innovations include: (i) a retrieval-based pipeline extracting high-level motion features using video encoder and specialized resamplers to distill semantic motion representations; (ii) an in-context learning approach for motion adaptation implemented through a causal transformer architecture; (iii) an attention-based motion injection adapter that seamlessly integrates transferred motion features into pretrained video diffusion models. Extensive experiments demonstrate that our method achieves significant improvements across multiple domains and various base models, all with negligible computational overhead during inference. Furthermore, our modular design enables zero-shot generalization to new domains by simply updating the retrieval database without retraining any components. This research enhances the core capability of video generation systems by enabling the effective retrieval and transfer of motion priors, facilitating the synthesis of realistic motion dynamics.
Abstract（参考訳）: 画像から映像への生成は拡散モデルの進歩によって顕著な進歩を遂げてきたが、現実的な動きを持つビデオを生成することは依然として非常に困難である。この難しさは、物理的な制約、オブジェクトの相互作用、そして様々なシナリオで簡単に一般化できないドメイン固有のダイナミクスをキャプチャする動きを正確にモデル化することの複雑さから生じる。そこで本研究では,アクティベーション・アウェア・モーション・アダプテーション(CAMA)を通じて,関連する参照ビデオから動きの先行を適応させることにより,動きリアリズムを向上するフレームワークであるMotionRAGを提案する。重要な技術革新は以下のとおりである。一ビデオエンコーダ及び特殊リサンプラーを用いて高次動作特徴を抽出し、セマンティックな動作表現を蒸留する検索ベースパイプライン (II)因果変換器アーキテクチャによる動作適応のための文脈内学習手法三動画拡散モデルに転写された動き特徴をシームレスに統合する注意型モーションインジェクションアダプタ。大規模な実験により,提案手法は複数の領域および様々なベースモデルにまたがる大幅な改善を実現し,すべて推論時の計算オーバーヘッドを無視できることを示した。さらに、モジュール設計により、コンポーネントを再トレーニングすることなく、検索データベースを更新するだけで、新しいドメインへのゼロショットの一般化が可能になる。本研究は, 映像生成システムのコア能力を高めるために, 動画像の効率的な検索と転送を可能にし, 現実的な動画像の合成を容易にする。

論文の概要: MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation

関連論文リスト