Related papers: Tempo as the Stable Cue: Hierarchical Mixture of Tempo and Beat Experts for Music to 3D Dance Generation

Tempo as the Stable Cue: Hierarchical Mixture of Tempo and Beat Experts for Music to 3D Dance Generation

URL: http://arxiv.org/abs/2512.18804v1
Date: Sun, 21 Dec 2025 16:57:08 GMT
Title: Tempo as the Stable Cue: Hierarchical Mixture of Tempo and Beat Experts for Music to 3D Dance Generation
Authors: Guangtao Lyu, Chenghao Xu, Qi Liu, Jiexi Yan, Muli Yang, Fen Fang, Cheng Deng,
Abstract summary: Music to 3D dance generation aims to synthesize realistic and rhythmically synchronized human dance from music.<n>We propose TempoMoE, a hierarchical tempo-aware Mixture-of-Experts module.<n>We show that TempoMoE achieves state-of-the-art results in dance quality and rhythm alignment.
Score: 62.82943523102
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Music to 3D dance generation aims to synthesize realistic and rhythmically synchronized human dance from music. While existing methods often rely on additional genre labels to further improve dance generation, such labels are typically noisy, coarse, unavailable, or insufficient to capture the diversity of real-world music, which can result in rhythm misalignment or stylistic drift. In contrast, we observe that tempo, a core property reflecting musical rhythm and pace, remains relatively consistent across datasets and genres, typically ranging from 60 to 200 BPM. Based on this finding, we propose TempoMoE, a hierarchical tempo-aware Mixture-of-Experts module that enhances the diffusion model and its rhythm perception. TempoMoE organizes motion experts into tempo-structured groups for different tempo ranges, with multi-scale beat experts capturing fine- and long-range rhythmic dynamics. A Hierarchical Rhythm-Adaptive Routing dynamically selects and fuses experts from music features, enabling flexible, rhythm-aligned generation without manual genre labels. Extensive experiments demonstrate that TempoMoE achieves state-of-the-art results in dance quality and rhythm alignment.

Related papers

Listen to Rhythm, Choose Movements: Autoregressive Multimodal Dance Generation via Diffusion and Mamba with Decoupled Dance Dataset [8.721362823189077]
Listen to Rhythm, Choose Movements (LRCM) is a multimodal-guided diffusion framework supporting both diverse input modalities and autoregressive dance motion generation.<n>We will release the full dataset, and pretrained models publicly upon acceptance.
arXiv Detail & Related papers (2026-01-06T14:59:22Z)
GACA-DiT: Diffusion-based Dance-to-Music Generation with Genre-Adaptive Rhythm and Context-Aware Alignment [16.93446224499017]
Dance-to-music (D2M) generation aims to automatically compose music that is rhythmically and temporally aligned with dance movements.<n>We propose textbfGACA-DiT, a diffusion transformer-based framework with two novel modules for rhythmically consistent and temporally aligned music generation.<n>Experiments on the AIST++ and TikTok datasets demonstrate that GACA-DiT outperforms state-of-the-art methods in both objective metrics and human evaluation.
arXiv Detail & Related papers (2025-10-28T09:26:59Z)
MotionRAG-Diff: A Retrieval-Augmented Diffusion Framework for Long-Term Music-to-Dance Generation [10.203209816178552]
MotionRAG-Diff is a hybrid framework that integrates Retrieval-Augmented Generation and diffusion-based refinement.<n>Our method introduces three core innovations.<n>It achieves state-of-the-art performance in motion quality, diversity, and music-motion synchronization accuracy.
arXiv Detail & Related papers (2025-06-03T09:12:48Z)
Align Your Rhythm: Generating Highly Aligned Dance Poses with Gating-Enhanced Rhythm-Aware Feature Representation [22.729568599120846]
We propose Danceba, a novel framework that leverages gating mechanism to enhance rhythm-aware feature representation.<n>Phase-Based Rhythm Extraction (PRE) to precisely extract rhythmic information from musical phase data.<n>Temporal-Gated Causal Attention (TGCA) to focus on global rhythmic features.<n> Parallel Mamba Motion Modeling (PMMM) architecture to separately model upper and lower body motions.
arXiv Detail & Related papers (2025-03-21T17:42:50Z)
GCDance: Genre-Controlled Music-Driven 3D Full Body Dance Generation [30.028340528694432]
GCDance is a framework for genre-specific 3D full-body dance generation conditioned on music and descriptive text.<n>We develop a text-based control mechanism that maps input prompts, explicit genre labels or free-form descriptive text, into genre-specific control signals.<n>To balance the objectives of extracting text-genre information and maintaining high-quality generation results, we propose a novel multi-task optimization strategy.
arXiv Detail & Related papers (2025-02-25T15:53:18Z)
Controllable Dance Generation with Style-Guided Motion Diffusion [49.35282418951445]
Dance plays an important role as an artistic form and expression in human culture, yet the creation of dance remains a challenging task.<n>Most dance generation methods rely solely on music, seldom taking into consideration intrinsic attributes such as music style or genre.<n>In this work, we introduce Flexible Dance Generation with Style Description Prompts (DGSDP), a diffusion-based framework suitable for diversified tasks of dance generation.
arXiv Detail & Related papers (2024-06-12T04:55:14Z)
DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation [89.50310360658791]
We present a novel cascaded motion diffusion model, DiffDance, designed for high-resolution, long-form dance generation. This model comprises a music-to-dance diffusion model and a sequence super-resolution diffusion model. We demonstrate that DiffDance is capable of generating realistic dance sequences that align effectively with the input music.
arXiv Detail & Related papers (2023-08-05T16:18:57Z)
Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic Memory [92.81383016482813]
We propose a novel music-to-dance framework, Bailando, for driving 3D characters to dance following a piece of music. We introduce an actor-critic Generative Pre-trained Transformer (GPT) that composes units to a fluent dance coherent to the music. Our proposed framework achieves state-of-the-art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-03-24T13:06:43Z)
Music-to-Dance Generation with Optimal Transport [48.92483627635586]
We propose a Music-to-Dance with Optimal Transport Network (MDOT-Net) for learning to generate 3D dance choreographs from music. We introduce an optimal transport distance for evaluating the authenticity of the generated dance distribution and a Gromov-Wasserstein distance to measure the correspondence between the dance distribution and the input music.
arXiv Detail & Related papers (2021-12-03T09:37:26Z)
Rhythm is a Dancer: Music-Driven Motion Synthesis with Global Structure [47.09425316677689]
We present a music-driven motion synthesis framework that generates long-term sequences of human motions synchronized with the input beats. Our framework enables generation of diverse motions that are controlled by the content of the music, and not only by the beat.
arXiv Detail & Related papers (2021-11-23T21:26:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.