Dance Any Beat: Blending Beats with Visuals in Dance Video Generation
- URL: http://arxiv.org/abs/2405.09266v2
- Date: Tue, 16 Jul 2024 07:09:27 GMT
- Title: Dance Any Beat: Blending Beats with Visuals in Dance Video Generation
- Authors: Xuanchen Wang, Heng Wang, Dongnan Liu, Weidong Cai,
- Abstract summary: We introduce a novel task: generating dance videos directly from images of individuals guided by music.
Our solution, the Dance Any Beat Diffusion model (DabFusion), utilizes a reference image and a music piece to generate dance videos.
We evaluate DabFusion's performance using the AIST++ dataset, focusing on video quality, audio-video synchronization, and motion-music alignment.
- Score: 12.018432669719742
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automated choreography advances by generating dance from music. Current methods create skeleton keypoint sequences, not full dance videos, and cannot make specific individuals dance, limiting their real-world use. These methods also need precise keypoint annotations, making data collection difficult and restricting the use of self-made video datasets. To overcome these challenges, we introduce a novel task: generating dance videos directly from images of individuals guided by music. This task enables the dance generation of specific individuals without requiring keypoint annotations, making it more versatile and applicable to various situations. Our solution, the Dance Any Beat Diffusion model (DabFusion), utilizes a reference image and a music piece to generate dance videos featuring various dance types and choreographies. The music is analyzed by our specially designed music encoder, which identifies essential features including dance style, movement, and rhythm. DabFusion excels in generating dance videos not only for individuals in the training dataset but also for any previously unseen person. This versatility stems from its approach of generating latent optical flow, which contains all necessary motion information to animate any person in the image. We evaluate DabFusion's performance using the AIST++ dataset, focusing on video quality, audio-video synchronization, and motion-music alignment. We propose a 2D Motion-Music Alignment Score (2D-MM Align), which builds on the Beat Alignment Score to more effectively evaluate motion-music alignment for this new task. Experiments show that our DabFusion establishes a solid baseline for this innovative task. Video results can be found on our project page: https://DabFusion.github.io.
Related papers
- Automatic Dance Video Segmentation for Understanding Choreography [10.053913399613764]
We propose a method to automatically segment a dance video into each movement.
To build our training dataset, we annotate segmentation points to dance videos in the AIST Dance Video Database.
The evaluation study shows that the proposed method can estimate segmentation points with high accuracy.
arXiv Detail & Related papers (2024-05-30T06:19:01Z) - DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance [50.01162760878841]
We present DCM, a new multi-modal 3D dataset that combines camera movement with dance motion and music audio.
This dataset encompasses 108 dance sequences (3.2 hours) of paired dance-camera-music data from the anime community.
We propose DanceCamera3D, a transformer-based diffusion model that incorporates a novel body attention loss and a condition separation strategy.
arXiv Detail & Related papers (2024-03-20T15:24:57Z) - TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration [75.37311932218773]
We propose a novel task for generating 3D dance movements that simultaneously incorporate both text and music modalities.
Our approach can generate realistic and coherent dance movements conditioned on both text and music while maintaining comparable performance with the two single modalities.
arXiv Detail & Related papers (2023-04-05T12:58:33Z) - Music-Driven Group Choreography [10.501572863039852]
$rm AIOZ-GDANCE$ is a new large-scale dataset for music-driven group dance generation.
We show that naively applying single dance generation technique to creating group dance motion may lead to unsatisfactory results.
We propose a new method that takes an input music sequence and a set of 3D positions of dancers to efficiently produce multiple group-coherent choreographies.
arXiv Detail & Related papers (2023-03-22T06:26:56Z) - BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis [123.73677487809418]
We introduce a new dataset aiming to challenge common assumptions in dance motion synthesis.
We focus on breakdancing which features acrobatic moves and tangled postures.
Our efforts produced the BRACE dataset, which contains over 3 hours and 30 minutes of densely annotated poses.
arXiv Detail & Related papers (2022-07-20T18:03:54Z) - Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic
Memory [92.81383016482813]
We propose a novel music-to-dance framework, Bailando, for driving 3D characters to dance following a piece of music.
We introduce an actor-critic Generative Pre-trained Transformer (GPT) that composes units to a fluent dance coherent to the music.
Our proposed framework achieves state-of-the-art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-03-24T13:06:43Z) - Music-to-Dance Generation with Optimal Transport [48.92483627635586]
We propose a Music-to-Dance with Optimal Transport Network (MDOT-Net) for learning to generate 3D dance choreographs from music.
We introduce an optimal transport distance for evaluating the authenticity of the generated dance distribution and a Gromov-Wasserstein distance to measure the correspondence between the dance distribution and the input music.
arXiv Detail & Related papers (2021-12-03T09:37:26Z) - DanceFormer: Music Conditioned 3D Dance Generation with Parametric
Motion Transformer [23.51701359698245]
In this paper, we reformulate it by a two-stage process, ie, a key pose generation and then an in-between parametric motion curve prediction.
We propose a large-scale music conditioned 3D dance dataset, called PhantomDance, that is accurately labeled by experienced animators.
Experiments demonstrate that the proposed method, even trained by existing datasets, can generate fluent, performative, and music-matched 3D dances.
arXiv Detail & Related papers (2021-03-18T12:17:38Z) - DanceIt: Music-inspired Dancing Video Synthesis [38.87762996956861]
We propose to reproduce such an inherent capability of the human-being within a computer vision system.
The proposed system consists of three modules.
The generated dancing videos match the content and rhythm of the music.
arXiv Detail & Related papers (2020-09-17T02:29:13Z) - Learning to Generate Diverse Dance Motions with Transformer [67.43270523386185]
We introduce a complete system for dance motion synthesis.
A massive dance motion data set is created from YouTube videos.
A novel two-stream motion transformer generative model can generate motion sequences with high flexibility.
arXiv Detail & Related papers (2020-08-18T22:29:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.