MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation
- URL: http://arxiv.org/abs/2512.18181v2
- Date: Mon, 29 Dec 2025 07:11:46 GMT
- Title: MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation
- Authors: Kaixing Yang, Jiashu Zhu, Xulong Tang, Ziqiao Peng, Xiangyue Zhang, Puwei Wang, Jiahong Wu, Xiangxiang Chu, Hongyan Liu, Jun He,
- Abstract summary: MACE-Dance is a music-driven dance video generation framework with cascaded Mixture-of-appearance (MoE)<n>The Motion Expert performs music-to-3D motion generation while enforcing kinematic plausibility and artistic expressiveness.<n>The Appearance Expert carries out motion- and reference-conditioned video synthesis, preserving visual identity withtemporal coherence.
- Score: 20.517753182293095
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the rise of online dance-video platforms and rapid advances in AI-generated content (AIGC), music-driven dance generation has emerged as a compelling research direction. Despite substantial progress in related domains such as music-driven 3D dance generation, pose-driven image animation, and audio-driven talking-head synthesis, existing methods cannot be directly adapted to this task. Moreover, the limited studies in this area still struggle to jointly achieve high-quality visual appearance and realistic human motion. Accordingly, we present MACE-Dance, a music-driven dance video generation framework with cascaded Mixture-of-Experts (MoE). The Motion Expert performs music-to-3D motion generation while enforcing kinematic plausibility and artistic expressiveness, whereas the Appearance Expert carries out motion- and reference-conditioned video synthesis, preserving visual identity with spatiotemporal coherence. Specifically, the Motion Expert adopts a diffusion model with a BiMamba-Transformer hybrid architecture and a Guidance-Free Training (GFT) strategy, achieving state-of-the-art (SOTA) performance in 3D dance generation. The Appearance Expert employs a decoupled kinematic-aesthetic fine-tuning strategy, achieving state-of-the-art (SOTA) performance in pose-driven image animation. To better benchmark this task, we curate a large-scale and diverse dataset and design a motion-appearance evaluation protocol. Based on this protocol, MACE-Dance also achieves state-of-the-art performance. Project page: https://macedance.github.io/
Related papers
- FlowerDance: MeanFlow for Efficient and Refined 3D Dance Generation [9.587067781689331]
Music-to-dance generation aims to translate auditory signals into expressive human motion, with broad applications in virtual reality, choreography, and digital entertainment.<n>We propose FlowerDance, which not only generates refined motion with physical plausibility and artistic expressiveness, but also achieves significant generation efficiency on inference speed and memory utilization.
arXiv Detail & Related papers (2025-11-26T03:53:10Z) - PAMD: Plausibility-Aware Motion Diffusion Model for Long Dance Generation [51.2555550979386]
Plausibility-Aware Motion Diffusion (PAMD) is a framework for generating dances that are both musically aligned and physically realistic.<n>To provide more effective guidance during generation, we incorporate Prior Motion Guidance (PMG)<n>Experiments show that PAMD significantly improves musical alignment and enhances the physical plausibility of generated motions.
arXiv Detail & Related papers (2025-05-26T14:44:09Z) - X-Dancer: Expressive Music to Human Dance Video Generation [26.544761204917336]
X-Dancer is a novel zero-shot music-driven image animation pipeline.<n>It creates diverse and long-range lifelike human dance videos from a single static image.
arXiv Detail & Related papers (2025-02-24T18:47:54Z) - Controllable Dance Generation with Style-Guided Motion Diffusion [49.35282418951445]
Dance plays an important role as an artistic form and expression in human culture, yet the creation of dance remains a challenging task.<n>Most dance generation methods rely solely on music, seldom taking into consideration intrinsic attributes such as music style or genre.<n>In this work, we introduce Flexible Dance Generation with Style Description Prompts (DGSDP), a diffusion-based framework suitable for diversified tasks of dance generation.
arXiv Detail & Related papers (2024-06-12T04:55:14Z) - MIDGET: Music Conditioned 3D Dance Generation [13.067687949642641]
We introduce a MusIc conditioned 3D Dance GEneraTion model, named MIDGET, to generate vibrant and highquality dances that match the music rhythm.
To tackle challenges in the field, we introduce three new components: 1) a pre-trained memory codebook based on the Motion VQ-VAE model to store different human pose codes, 2) employing Motion GPT model to generate pose codes with music and motion ablations, and 3) a simple framework for music feature extraction.
arXiv Detail & Related papers (2024-04-18T10:20:37Z) - TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration [75.37311932218773]
We propose a novel task for generating 3D dance movements that simultaneously incorporate both text and music modalities.
Our approach can generate realistic and coherent dance movements conditioned on both text and music while maintaining comparable performance with the two single modalities.
arXiv Detail & Related papers (2023-04-05T12:58:33Z) - BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis [123.73677487809418]
We introduce a new dataset aiming to challenge common assumptions in dance motion synthesis.
We focus on breakdancing which features acrobatic moves and tangled postures.
Our efforts produced the BRACE dataset, which contains over 3 hours and 30 minutes of densely annotated poses.
arXiv Detail & Related papers (2022-07-20T18:03:54Z) - Music-to-Dance Generation with Optimal Transport [48.92483627635586]
We propose a Music-to-Dance with Optimal Transport Network (MDOT-Net) for learning to generate 3D dance choreographs from music.
We introduce an optimal transport distance for evaluating the authenticity of the generated dance distribution and a Gromov-Wasserstein distance to measure the correspondence between the dance distribution and the input music.
arXiv Detail & Related papers (2021-12-03T09:37:26Z) - DanceFormer: Music Conditioned 3D Dance Generation with Parametric
Motion Transformer [23.51701359698245]
In this paper, we reformulate it by a two-stage process, ie, a key pose generation and then an in-between parametric motion curve prediction.
We propose a large-scale music conditioned 3D dance dataset, called PhantomDance, that is accurately labeled by experienced animators.
Experiments demonstrate that the proposed method, even trained by existing datasets, can generate fluent, performative, and music-matched 3D dances.
arXiv Detail & Related papers (2021-03-18T12:17:38Z) - Learning to Generate Diverse Dance Motions with Transformer [67.43270523386185]
We introduce a complete system for dance motion synthesis.
A massive dance motion data set is created from YouTube videos.
A novel two-stream motion transformer generative model can generate motion sequences with high flexibility.
arXiv Detail & Related papers (2020-08-18T22:29:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.