Dance Any Beat: Blending Beats with Visuals in Dance Video Generation
- URL: http://arxiv.org/abs/2405.09266v3
- Date: Thu, 28 Nov 2024 10:30:14 GMT
- Title: Dance Any Beat: Blending Beats with Visuals in Dance Video Generation
- Authors: Xuanchen Wang, Heng Wang, Dongnan Liu, Weidong Cai,
- Abstract summary: We introduce a novel task: generating dance videos directly from images of individuals guided by music.
Our solution, the Dance Any Beat Diffusion model (DabFusion), utilizes a reference image and a music piece to generate dance videos.
We evaluate DabFusion's performance using the AIST++ dataset, focusing on video quality, audio-video synchronization, and motion-music alignment.
- Score: 12.018432669719742
- License:
- Abstract: Generating dance from music is crucial for advancing automated choreography. Current methods typically produce skeleton keypoint sequences instead of dance videos and lack the capability to make specific individuals dance, which reduces their real-world applicability. These methods also require precise keypoint annotations, complicating data collection and limiting the use of self-collected video datasets. To overcome these challenges, we introduce a novel task: generating dance videos directly from images of individuals guided by music. This task enables the dance generation of specific individuals without requiring keypoint annotations, making it more versatile and applicable to various situations. Our solution, the Dance Any Beat Diffusion model (DabFusion), utilizes a reference image and a music piece to generate dance videos featuring various dance types and choreographies. The music is analyzed by our specially designed music encoder, which identifies essential features including dance style, movement, and rhythm. DabFusion excels in generating dance videos not only for individuals in the training dataset but also for any previously unseen person. This versatility stems from its approach of generating latent optical flow, which contains all necessary motion information to animate any person in the image. We evaluate DabFusion's performance using the AIST++ dataset, focusing on video quality, audio-video synchronization, and motion-music alignment. We propose a 2D Motion-Music Alignment Score (2D-MM Align), which builds on the Beat Alignment Score to more effectively evaluate motion-music alignment for this new task. Experiments show that our DabFusion establishes a solid baseline for this innovative task. Video results can be found on our project page: https://DabFusion.github.io.
Related papers
- Every Image Listens, Every Image Dances: Music-Driven Image Animation [8.085267959520843]
MuseDance is an end-to-end model that animates reference images using both music and text inputs.
Unlike existing approaches, MuseDance eliminates the need for complex motion guidance inputs, such as pose or depth sequences.
We present a new multimodal dataset comprising 2,904 dance videos with corresponding background music and text descriptions.
arXiv Detail & Related papers (2025-01-30T23:38:51Z) - Automatic Dance Video Segmentation for Understanding Choreography [10.053913399613764]
We propose a method to automatically segment a dance video into each movement.
To build our training dataset, we annotate segmentation points to dance videos in the AIST Dance Video Database.
The evaluation study shows that the proposed method can estimate segmentation points with high accuracy.
arXiv Detail & Related papers (2024-05-30T06:19:01Z) - DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance [50.01162760878841]
We present DCM, a new multi-modal 3D dataset that combines camera movement with dance motion and music audio.
This dataset encompasses 108 dance sequences (3.2 hours) of paired dance-camera-music data from the anime community.
We propose DanceCamera3D, a transformer-based diffusion model that incorporates a novel body attention loss and a condition separation strategy.
arXiv Detail & Related papers (2024-03-20T15:24:57Z) - TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration [75.37311932218773]
We propose a novel task for generating 3D dance movements that simultaneously incorporate both text and music modalities.
Our approach can generate realistic and coherent dance movements conditioned on both text and music while maintaining comparable performance with the two single modalities.
arXiv Detail & Related papers (2023-04-05T12:58:33Z) - BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis [123.73677487809418]
We introduce a new dataset aiming to challenge common assumptions in dance motion synthesis.
We focus on breakdancing which features acrobatic moves and tangled postures.
Our efforts produced the BRACE dataset, which contains over 3 hours and 30 minutes of densely annotated poses.
arXiv Detail & Related papers (2022-07-20T18:03:54Z) - Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic
Memory [92.81383016482813]
We propose a novel music-to-dance framework, Bailando, for driving 3D characters to dance following a piece of music.
We introduce an actor-critic Generative Pre-trained Transformer (GPT) that composes units to a fluent dance coherent to the music.
Our proposed framework achieves state-of-the-art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-03-24T13:06:43Z) - Music-to-Dance Generation with Optimal Transport [48.92483627635586]
We propose a Music-to-Dance with Optimal Transport Network (MDOT-Net) for learning to generate 3D dance choreographs from music.
We introduce an optimal transport distance for evaluating the authenticity of the generated dance distribution and a Gromov-Wasserstein distance to measure the correspondence between the dance distribution and the input music.
arXiv Detail & Related papers (2021-12-03T09:37:26Z) - DanceFormer: Music Conditioned 3D Dance Generation with Parametric
Motion Transformer [23.51701359698245]
In this paper, we reformulate it by a two-stage process, ie, a key pose generation and then an in-between parametric motion curve prediction.
We propose a large-scale music conditioned 3D dance dataset, called PhantomDance, that is accurately labeled by experienced animators.
Experiments demonstrate that the proposed method, even trained by existing datasets, can generate fluent, performative, and music-matched 3D dances.
arXiv Detail & Related papers (2021-03-18T12:17:38Z) - DanceIt: Music-inspired Dancing Video Synthesis [38.87762996956861]
We propose to reproduce such an inherent capability of the human-being within a computer vision system.
The proposed system consists of three modules.
The generated dancing videos match the content and rhythm of the music.
arXiv Detail & Related papers (2020-09-17T02:29:13Z) - Learning to Generate Diverse Dance Motions with Transformer [67.43270523386185]
We introduce a complete system for dance motion synthesis.
A massive dance motion data set is created from YouTube videos.
A novel two-stream motion transformer generative model can generate motion sequences with high flexibility.
arXiv Detail & Related papers (2020-08-18T22:29:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.