Fugu-MT 論文翻訳(概要): StableAnimator++: Overcoming Pose Misalignment and Face Distortion for Human Image Animation

論文の概要: StableAnimator++: Overcoming Pose Misalignment and Face Distortion for Human Image Animation

arxiv url: http://arxiv.org/abs/2507.15064v1
Date: Sun, 20 Jul 2025 17:59:26 GMT
ステータス: 翻訳完了
システム内更新日: 2025-07-22 20:51:32.178305
Title: StableAnimator++: Overcoming Pose Misalignment and Face Distortion for Human Image Animation
Title（参考訳）: StableAnimator++: 人間の画像アニメーションにおけるポーズミスと顔の歪みを克服する
Authors: Shuyuan Tu, Zhen Xing, Xintong Han, Zhi-Qi Cheng, Qi Dai, Chong Luo, Zuxuan Wu, Yu-Gang Jiang,
Abstract要約: 人間の画像アニメーションの現在の拡散モデルは、アイデンティティの整合性を維持するのに苦労することが多い。学習可能なポーズアライメントを備えた最初のID保存ビデオ拡散フレームワークであるStableAnimator++を紹介する。本稿では,StableAnimator++が参照画像とポーズシーケンスに条件付き高品質な動画を後処理なしで生成する方法を示す。
参考スコア（独自算出の注目度）: 98.10527466949338
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current diffusion models for human image animation often struggle to maintain identity (ID) consistency, especially when the reference image and driving video differ significantly in body size or position. We introduce StableAnimator++, the first ID-preserving video diffusion framework with learnable pose alignment, capable of generating high-quality videos conditioned on a reference image and a pose sequence without any post-processing. Building upon a video diffusion model, StableAnimator++ contains carefully designed modules for both training and inference, striving for identity consistency. In particular, StableAnimator++ first uses learnable layers to predict the similarity transformation matrices between the reference image and the driven poses via injecting guidance from Singular Value Decomposition (SVD). These matrices align the driven poses with the reference image, mitigating misalignment to a great extent. StableAnimator++ then computes image and face embeddings using off-the-shelf encoders, refining the face embeddings via a global content-aware Face Encoder. To further maintain ID, we introduce a distribution-aware ID Adapter that counteracts interference caused by temporal layers while preserving ID via distribution alignment. During the inference stage, we propose a novel Hamilton-Jacobi-Bellman (HJB) based face optimization integrated into the denoising process, guiding the diffusion trajectory for enhanced facial fidelity. Experiments on benchmarks show the effectiveness of StableAnimator++ both qualitatively and quantitatively.
Abstract（参考訳）: 人間の画像アニメーションの現在の拡散モデルは、特に参照画像と駆動ビデオが体の大きさや位置で著しく異なる場合、ID(ID)一貫性を維持するのに苦労することが多い。本稿では,学習可能なポーズアライメントを備えた最初のID保存ビデオ拡散フレームワークであるStableAnimator++について紹介する。ビデオ拡散モデルに基づいて構築されているStableAnimator++には、トレーニングと推論の両方のための慎重に設計されたモジュールが含まれており、アイデンティティの整合性を目指している。特にStableAnimator++は、まず学習可能なレイヤを使用して、Singular Value Decomposition(SVD)からのインジェクションを通じて、参照イメージと駆動ポーズの類似度変換行列を予測する。これらの行列は、駆動されたポーズと参照画像とを一致させ、ミスアライメントを極端に軽減する。 StableAnimator++は、オフザシェルのエンコーダを使用してイメージとフェイスの埋め込みを計算し、グローバルなコンテントを意識したFace Encoderを介して顔の埋め込みを精錬する。さらにIDの維持を図るため,配電アライメントを通じてIDを保存しながら,時相層による干渉に対処する分散対応IDアダプタを提案する。推論段階では,ハミルトン・ヤコビ・ベルマン (HJB) をベースとした新しい顔最適化法を提案し,顔の忠実度向上のための拡散軌跡を導出する。ベンチマークの実験では、定性的かつ定量的に、StableAnimator++の有効性が示されている。

論文の概要: StableAnimator++: Overcoming Pose Misalignment and Face Distortion for Human Image Animation

関連論文リスト