MDD: A Dataset for Text-and-Music Conditioned Duet Dance Generation
- URL: http://arxiv.org/abs/2508.16911v1
- Date: Sat, 23 Aug 2025 05:56:37 GMT
- Title: MDD: A Dataset for Text-and-Music Conditioned Duet Dance Generation
- Authors: Prerit Gupta, Jason Alexander Fotso-Puepi, Zhengyuan Li, Jay Mehta, Aniket Bera,
- Abstract summary: We introduce Multimodal DuetDance (MDD), a diverse multimodal benchmark dataset designed for text-controlled and music-conditioned 3D duet dance motion generation.<n>Our dataset comprises 620 minutes of high-quality motion capture data performed by professional dancers, synchronized with music, and detailed with over 10K fine-grained natural language descriptions.<n>The annotations capture a rich movement vocabulary, detailing spatial relationships, body movements, and rhythm, making MDD the first dataset to seamlessly integrate human motions, music, and text for duet dance generation.
- Score: 16.657210427678198
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce Multimodal DuetDance (MDD), a diverse multimodal benchmark dataset designed for text-controlled and music-conditioned 3D duet dance motion generation. Our dataset comprises 620 minutes of high-quality motion capture data performed by professional dancers, synchronized with music, and detailed with over 10K fine-grained natural language descriptions. The annotations capture a rich movement vocabulary, detailing spatial relationships, body movements, and rhythm, making MDD the first dataset to seamlessly integrate human motions, music, and text for duet dance generation. We introduce two novel tasks supported by our dataset: (1) Text-to-Duet, where given music and a textual prompt, both the leader and follower dance motion are generated (2) Text-to-Dance Accompaniment, where given music, textual prompt, and the leader's motion, the follower's motion is generated in a cohesive, text-aligned manner. We include baseline evaluations on both tasks to support future research.
Related papers
- DuetGen: Music Driven Two-Person Dance Generation via Hierarchical Masked Modeling [70.79846001735547]
We present DuetGen, a framework for generating interactive two-person dances from music.<n>Inspired by the recent advances in motion synthesis, we propose a two-stage solution.<n>We represent both dancers' motions as a unified whole to learn the necessary motion tokens.
arXiv Detail & Related papers (2025-06-23T14:22:50Z) - DanceChat: Large Language Model-Guided Music-to-Dance Generation [8.455652926559427]
Music-to-dance generation aims to synthesize human dance motion conditioned on musical input.<n>We introduce DanceChat, a Large Language Model (LLM)-guided music-to-dance generation approach.
arXiv Detail & Related papers (2025-06-12T11:03:47Z) - GCDance: Genre-Controlled Music-Driven 3D Full Body Dance Generation [30.028340528694432]
GCDance is a framework for genre-specific 3D full-body dance generation conditioned on music and descriptive text.<n>We develop a text-based control mechanism that maps input prompts, explicit genre labels or free-form descriptive text, into genre-specific control signals.<n>To balance the objectives of extracting text-genre information and maintaining high-quality generation results, we propose a novel multi-task optimization strategy.
arXiv Detail & Related papers (2025-02-25T15:53:18Z) - Controllable Dance Generation with Style-Guided Motion Diffusion [49.35282418951445]
Dance plays an important role as an artistic form and expression in human culture, yet the creation of dance remains a challenging task.<n>Most dance generation methods rely solely on music, seldom taking into consideration intrinsic attributes such as music style or genre.<n>In this work, we introduce Flexible Dance Generation with Style Description Prompts (DGSDP), a diffusion-based framework suitable for diversified tasks of dance generation.
arXiv Detail & Related papers (2024-06-12T04:55:14Z) - Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment [87.20240797625648]
We introduce a novel task within the field of 3D dance generation, termed dance accompaniment.
It requires the generation of responsive movements from a dance partner, the "follower", synchronized with the lead dancer's movements and the underlying musical rhythm.
We propose a GPT-based model, Duolando, which autoregressively predicts the subsequent tokenized motion conditioned on the coordinated information of the music, the leader's and the follower's movements.
arXiv Detail & Related papers (2024-03-27T17:57:02Z) - LM2D: Lyrics- and Music-Driven Dance Synthesis [28.884929875333846]
LM2D is designed to create dance conditioned on both music and lyrics in one diffusion generation step.
We introduce the first 3D dance-motion dataset that encompasses both music and lyrics, obtained with pose estimation technologies.
The results demonstrate LM2D is able to produce realistic and diverse dance matching both lyrics and music.
arXiv Detail & Related papers (2024-03-14T13:59:04Z) - BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics [50.88842027976421]
We propose BOTH57M, a novel multi-modal dataset for two-hand motion generation.
Our dataset includes accurate motion tracking for the human body and hands.
We also provide a strong baseline method, BOTH2Hands, for the novel task.
arXiv Detail & Related papers (2023-12-13T07:30:19Z) - Music- and Lyrics-driven Dance Synthesis [25.474741654098114]
We introduce JustLMD, a new dataset of 3D dance motion with music and lyrics.
To the best of our knowledge, this is the first dataset with triplet information including dance motion, music, and lyrics.
The proposed JustLMD dataset encompasses 4.6 hours of 3D dance motion in 1867 sequences, accompanied by musical tracks and their corresponding English lyrics.
arXiv Detail & Related papers (2023-09-30T18:27:14Z) - TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration [75.37311932218773]
We propose a novel task for generating 3D dance movements that simultaneously incorporate both text and music modalities.
Our approach can generate realistic and coherent dance movements conditioned on both text and music while maintaining comparable performance with the two single modalities.
arXiv Detail & Related papers (2023-04-05T12:58:33Z) - BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis [123.73677487809418]
We introduce a new dataset aiming to challenge common assumptions in dance motion synthesis.
We focus on breakdancing which features acrobatic moves and tangled postures.
Our efforts produced the BRACE dataset, which contains over 3 hours and 30 minutes of densely annotated poses.
arXiv Detail & Related papers (2022-07-20T18:03:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.