Fugu-MT 論文翻訳(概要): SCAIL-2: Unifying Controlled Character Animation with End-to-end In-Context Conditioning

論文の概要: SCAIL-2: Unifying Controlled Character Animation with End-to-end In-Context Conditioning

arxiv url: http://arxiv.org/abs/2606.10804v2
Date: Wed, 10 Jun 2026 02:21:47 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-11 14:23:44.396406
Title: SCAIL-2: Unifying Controlled Character Animation with End-to-end In-Context Conditioning
Title（参考訳）: SCAIL-2: エンドツーエンドのIn-Contextコンディショニングによる制御された文字アニメーションの統合
Authors: Wenhao Yan, Fengjia Guo, Zhuoyi Yang, Jie Tang,
Abstract要約: 制御されたキャラクタアニメーションは、駆動シーケンスから参照キャラクタへの動作の転送を必要とする。 SCAIL-2は、これらの中間体をバイパスし、textbfend-to-endキャラクタアニメーションを実現するフレームワークである。
参考スコア（独自算出の注目度）: 12.196865711049561
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Controlled character animation requires transferring motion from a driving sequence to a reference character. Prior works heavily rely on intermediate representations, including pose skeletons to represent motion or masked background to represent environment, which inevitably leads to information loss. To address this, we present SCAIL-2, a framework that bypasses those intermediates and achieves \textbf{end-to-end} character animation. By directly concatenating driving videos to the sequence, the model can obtain all the required visual information from the input video. To address the lack of end-to-end data, we unify sub-tasks of character animation with decoupled conditions and then curate a pipeline to synthesize MotionPair-60K, an end-to-end motion transfer dataset containing heterogeneous tasks of character animation. To achieve the unification, we utilize in-context mask conditioning and mode-specific RoPE as soft guidance beyond textual instructions and raw visual information. To address synthetic discrepancy in detailed regions, we propose Bias-Aware DPO to construct preference items to mitigate the errors. Extensive experiments demonstrate that our method substantially outperforms existing state-of-the-art approaches in various character animation tasks. A large subset of synthetic data as well as model weights will be released at our project page: https://teal024.github.io/SCAIL-2/.
Abstract（参考訳）: 制御されたキャラクタアニメーションは、駆動シーケンスから参照キャラクタへの動作の転送を必要とする。以前の作品では、動きを表すポーズスケルトンや環境を表すマスクされた背景など、中間表現に大きく依存しており、必然的に情報損失につながる。そこで本研究では,これらの中間体をバイパスし,textbf{end-to-end}文字アニメーションを実現するSCAIL-2を提案する。ドライブビデオとシーケンスを直接結合することにより、モデルが入力ビデオから必要な視覚情報をすべて得ることができる。エンド・ツー・エンドデータの欠如に対処するため、キャラクタアニメーションのサブタスクを分離した条件で統一し、パイプラインをキュレートしてMotionPair-60Kを合成する。この統合を実現するために,テキスト命令や生の視覚情報を超えたソフトガイダンスとして,テキスト内マスク条件とモード固有のRoPEを利用する。詳細な領域における合成不一致に対処するため,Bias-Aware DPOを提案する。広範囲な実験により,本手法は様々なキャラクターアニメーションタスクにおいて,既存の最先端手法よりも大幅に優れていることが示された。合成データの大規模なサブセットとモデルウェイトは、プロジェクトページでリリースされます。

論文の概要: SCAIL-2: Unifying Controlled Character Animation with End-to-end In-Context Conditioning

関連論文リスト