DisCo: Disentangled Control for Realistic Human Dance Generation
- URL: http://arxiv.org/abs/2307.00040v3
- Date: Thu, 4 Apr 2024 19:41:09 GMT
- Title: DisCo: Disentangled Control for Realistic Human Dance Generation
- Authors: Tan Wang, Linjie Li, Kevin Lin, Yuanhao Zhai, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang,
- Abstract summary: We introduce DISCO, which includes a novel model architecture with disentangled control to improve the compositionality of dance synthesis.
DisCc can generate high-quality human dance images and videos with diverse appearances and flexible motions.
- Score: 125.85046815185866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative AI has made significant strides in computer vision, particularly in text-driven image/video synthesis (T2I/T2V). Despite the notable advancements, it remains challenging in human-centric content synthesis such as realistic dance generation. Current methodologies, primarily tailored for human motion transfer, encounter difficulties when confronted with real-world dance scenarios (e.g., social media dance), which require to generalize across a wide spectrum of poses and intricate human details. In this paper, we depart from the traditional paradigm of human motion transfer and emphasize two additional critical attributes for the synthesis of human dance content in social media contexts: (i) Generalizability: the model should be able to generalize beyond generic human viewpoints as well as unseen human subjects, backgrounds, and poses; (ii) Compositionality: it should allow for the seamless composition of seen/unseen subjects, backgrounds, and poses from different sources. To address these challenges, we introduce DISCO, which includes a novel model architecture with disentangled control to improve the compositionality of dance synthesis, and an effective human attribute pre-training for better generalizability to unseen humans. Extensive qualitative and quantitative results demonstrate that DisCc can generate high-quality human dance images and videos with diverse appearances and flexible motions. Code is available at https://disco-dance.github.io/.
Related papers
- Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance [48.986552871497]
We introduce a novel two-stage framework that employs scene affordance as an intermediate representation.
By leveraging scene affordance maps, our method overcomes the difficulty in generating human motion under multimodal condition signals.
Our approach consistently outperforms all baselines on established benchmarks, including HumanML3D and HUMANISE.
arXiv Detail & Related papers (2024-03-26T18:41:07Z) - Do You Guys Want to Dance: Zero-Shot Compositional Human Dance
Generation with Multiple Persons [73.21855272778616]
We introduce a new task, dataset, and evaluation protocol of compositional human dance generation (cHDG)
We propose a novel zero-shot framework, dubbed MultiDance-Zero, that can synthesize videos consistent with arbitrary multiple persons and background while precisely following the driving poses.
arXiv Detail & Related papers (2024-01-24T10:44:16Z) - BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis [123.73677487809418]
We introduce a new dataset aiming to challenge common assumptions in dance motion synthesis.
We focus on breakdancing which features acrobatic moves and tangled postures.
Our efforts produced the BRACE dataset, which contains over 3 hours and 30 minutes of densely annotated poses.
arXiv Detail & Related papers (2022-07-20T18:03:54Z) - Adversarial Attention for Human Motion Synthesis [3.9378507882929563]
We present a novel method for controllable human motion synthesis by applying attention-based probabilistic deep adversarial models with end-to-end training.
We show that we can generate synthetic human motion over both short- and long-time horizons through the use of adversarial attention.
arXiv Detail & Related papers (2022-04-25T16:12:42Z) - TEMOS: Generating diverse human motions from textual descriptions [53.85978336198444]
We address the problem of generating diverse 3D human motions from textual descriptions.
We propose TEMOS, a text-conditioned generative model leveraging variational autoencoder (VAE) training with human motion data.
We show that TEMOS framework can produce both skeleton-based animations as in prior work, as well more expressive SMPL body motions.
arXiv Detail & Related papers (2022-04-25T14:53:06Z) - Transflower: probabilistic autoregressive dance generation with
multimodal attention [31.308435764603658]
We present a novel probabilistic autoregressive architecture that models the distribution over future poses with a normalizing flow conditioned on previous poses as well as music context.
Second, we introduce the currently largest 3D dance-motion dataset, obtained with a variety of motion-capture technologies, and including both professional and casual dancers.
arXiv Detail & Related papers (2021-06-25T20:14:28Z) - High-Fidelity Neural Human Motion Transfer from Monocular Video [71.75576402562247]
Video-based human motion transfer creates video animations of humans following a source motion.
We present a new framework which performs high-fidelity and temporally-consistent human motion transfer with natural pose-dependent non-rigid deformations.
In the experimental results, we significantly outperform the state-of-the-art in terms of video realism.
arXiv Detail & Related papers (2020-12-20T16:54:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.