Fugu-MT 論文翻訳(概要): DreamActor-M2: Universal Character Image Animation via Spatiotemporal In-Context Learning

論文の概要: DreamActor-M2: Universal Character Image Animation via Spatiotemporal In-Context Learning

arxiv url: http://arxiv.org/abs/2601.21716v1
Date: Thu, 29 Jan 2026 13:43:17 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-30 16:22:49.860543
Title: DreamActor-M2: Universal Character Image Animation via Spatiotemporal In-Context Learning
Title（参考訳）: DreamActor-M2:時空間インコンテキスト学習によるユニバーサルキャラクタイメージアニメーション
Authors: Mingshuang Luo, Shuang Liang, Zhengkun Rong, Yuxuan Luo, Tianshu Hu, Ruibing Hou, Hong Chang, Yong Li, Yuan Zhang, Mingyuan Gao,
Abstract要約: 本研究では,DreamActor-M2を提案する。DreamActor-M2は,動作条件をコンテキスト内学習問題として再定義する汎用アニメーションフレームワークである。まず、参照の出現と動きの手がかりを統一された潜在空間に融合させることにより、入力モダリティギャップを橋渡しする。次に、擬似的クロスアイデンティティトレーニングペアをキュレートする自己ブートストラップデータ合成パイプラインを導入する。
参考スコア（独自算出の注目度）: 24.808926786222376
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Character image animation aims to synthesize high-fidelity videos by transferring motion from a driving sequence to a static reference image. Despite recent advancements, existing methods suffer from two fundamental challenges: (1) suboptimal motion injection strategies that lead to a trade-off between identity preservation and motion consistency, manifesting as a "see-saw", and (2) an over-reliance on explicit pose priors (e.g., skeletons), which inadequately capture intricate dynamics and hinder generalization to arbitrary, non-humanoid characters. To address these challenges, we present DreamActor-M2, a universal animation framework that reimagines motion conditioning as an in-context learning problem. Our approach follows a two-stage paradigm. First, we bridge the input modality gap by fusing reference appearance and motion cues into a unified latent space, enabling the model to jointly reason about spatial identity and temporal dynamics by leveraging the generative prior of foundational models. Second, we introduce a self-bootstrapped data synthesis pipeline that curates pseudo cross-identity training pairs, facilitating a seamless transition from pose-dependent control to direct, end-to-end RGB-driven animation. This strategy significantly enhances generalization across diverse characters and motion scenarios. To facilitate comprehensive evaluation, we further introduce AW Bench, a versatile benchmark encompassing a wide spectrum of characters types and motion scenarios. Extensive experiments demonstrate that DreamActor-M2 achieves state-of-the-art performance, delivering superior visual fidelity and robust cross-domain generalization. Project Page: https://grisoon.github.io/DreamActor-M2/
Abstract（参考訳）: キャラクタ画像アニメーションは、駆動シーケンスから静的参照画像へ動きを移すことにより、高忠実度ビデオを合成することを目的としている。近年の進歩にもかかわらず、既存の手法では、(1)アイデンティティの保存と「シーソー」として表される動きの一貫性のトレードオフにつながる準最適モーションインジェクション戦略、(2)複雑なダイナミクスを不十分に捉え、任意の非ヒューマノイドキャラクタへの一般化を妨げる明示的なポーズ先行(例えば骨格)への過度な信頼の2つの基本的な課題に悩まされている。これらの課題に対処するために,動作条件をコンテキスト内学習問題として再定義するユニバーサルアニメーションフレームワークDreamActor-M2を提案する。私たちのアプローチは2段階のパラダイムに従っています。まず、参照外観と動きキューを統一された潜在空間に融合させることにより、入力モダリティギャップをブリッジし、基本モデルの生成的先行性を活用することにより、モデルが空間的アイデンティティと時間的ダイナミクスについて共同で推論できるようにする。第2に,擬似的相互同一性トレーニングペアをキュレートし,ポーズ依存制御から直接的,エンドツーエンドのRGB駆動アニメーションへのシームレスな移行を容易にする自己ブートストラップ型データ合成パイプラインを導入する。この戦略は様々なキャラクターや動きのシナリオをまたいだ一般化を著しく強化する。包括的評価を容易にするために,多目的ベンチマークであるAW Benchを導入する。大規模な実験により、DreamActor-M2は最先端のパフォーマンスを実現し、優れた視覚的忠実さと堅牢なクロスドメインの一般化を実現する。 Project Page: https://grisoon.github.io/DreamActor-M2/

論文の概要: DreamActor-M2: Universal Character Image Animation via Spatiotemporal In-Context Learning

関連論文リスト