Fugu-MT 論文翻訳(概要): DANCER: Dance ANimation via Condition Enhancement and Rendering with diffusion model

論文の概要: DANCER: Dance ANimation via Condition Enhancement and Rendering with diffusion model

arxiv url: http://arxiv.org/abs/2510.27169v1
Date: Fri, 31 Oct 2025 04:42:08 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-03 17:52:15.977724
Title: DANCER: Dance ANimation via Condition Enhancement and Rendering with diffusion model
Title（参考訳）: DANCER:拡散モデルによる条件強調とレンダリングによるダンスアニメーション
Authors: Yucheng Xing, Jinxing Yin, Xiaodong Liu,
Abstract要約: 最新の安定な映像拡散モデルに基づく現実的な個人舞踊合成のための新しいフレームワーク DANCER を提案する。フレームワークに2つの重要なモジュールを導入し、この2つのインプットを完全に活用します。インターネットから大量の映像データを収集し,新たなデータセットTikTok-3Kを生成し,モデルトレーニングの強化を図る。
参考スコア（独自算出の注目度）: 5.78710251788825
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, diffusion models have shown their impressive ability in visual generation tasks. Besides static images, more and more research attentions have been drawn to the generation of realistic videos. The video generation not only has a higher requirement for the quality, but also brings a challenge in ensuring the video continuity. Among all the video generation tasks, human-involved contents, such as human dancing, are even more difficult to generate due to the high degrees of freedom associated with human motions. In this paper, we propose a novel framework, named as DANCER (Dance ANimation via Condition Enhancement and Rendering with Diffusion Model), for realistic single-person dance synthesis based on the most recent stable video diffusion model. As the video generation is generally guided by a reference image and a video sequence, we introduce two important modules into our framework to fully benefit from the two inputs. More specifically, we design an Appearance Enhancement Module (AEM) to focus more on the details of the reference image during the generation, and extend the motion guidance through a Pose Rendering Module (PRM) to capture pose conditions from extra domains. To further improve the generation capability of our model, we also collect a large amount of video data from Internet, and generate a novel datasetTikTok-3K to enhance the model training. The effectiveness of the proposed model has been evaluated through extensive experiments on real-world datasets, where the performance of our model is superior to that of the state-of-the-art methods. All the data and codes will be released upon acceptance.
Abstract（参考訳）: 近年、拡散モデルは視覚生成タスクにおいて顕著な能力を示している。静的画像以外にも、現実的なビデオの生成にますます多くの研究が注がれている。ビデオ生成は品質の要求が高いだけでなく、ビデオの連続性の確保にも挑戦している。すべてのビデオ生成タスクの中で、人間のダンスのような人間関係のコンテンツは、人間の動きに関連する高い自由度のために、さらに生成することが困難である。本稿では,DANCER (Dance Animation via Condition Enhancement and Rendering with Diffusion Model) という,最新の安定なビデオ拡散モデルに基づく現実的な単一人物舞踊合成のためのフレームワークを提案する。ビデオ生成は、一般的に参照画像とビデオシーケンスでガイドされるので、2つの重要なモジュールをフレームワークに導入して、2つの入力をフルに活用する。より具体的には、生成中の参照画像の詳細に焦点を合わせるために、外観拡張モジュール (AEM) を設計し、さらに、追加ドメインからのポーズ条件をキャプチャするために、Pose Rendering Module (PRM) を介して動き誘導を拡張する。インターネットから大量のビデオデータを収集し,新しいデータセットTikTok-3Kを生成し,モデルトレーニングを強化する。提案モデルの有効性は,提案モデルの性能が最先端の手法よりも優れている実世界のデータセット上での広範な実験を通じて評価されてきた。すべてのデータとコードは、受け入れ次第リリースされます。

論文の概要: DANCER: Dance ANimation via Condition Enhancement and Rendering with diffusion model

関連論文リスト