Related papers: Dance Any Beat: Blending Beats with Visuals in Dance Video Generation

Dance Any Beat: Blending Beats with Visuals in Dance Video Generation

URL: http://arxiv.org/abs/2405.09266v3
Date: Thu, 28 Nov 2024 10:30:14 GMT
Title: Dance Any Beat: Blending Beats with Visuals in Dance Video Generation
Authors: Xuanchen Wang, Heng Wang, Dongnan Liu, Weidong Cai,
Abstract summary: We introduce a novel task: generating dance videos directly from images of individuals guided by music.<n>Our solution, the Dance Any Beat Diffusion model (DabFusion), utilizes a reference image and a music piece to generate dance videos.<n>We evaluate DabFusion's performance using the AIST++ dataset, focusing on video quality, audio-video synchronization, and motion-music alignment.
Score: 12.018432669719742
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generating dance from music is crucial for advancing automated choreography. Current methods typically produce skeleton keypoint sequences instead of dance videos and lack the capability to make specific individuals dance, which reduces their real-world applicability. These methods also require precise keypoint annotations, complicating data collection and limiting the use of self-collected video datasets. To overcome these challenges, we introduce a novel task: generating dance videos directly from images of individuals guided by music. This task enables the dance generation of specific individuals without requiring keypoint annotations, making it more versatile and applicable to various situations. Our solution, the Dance Any Beat Diffusion model (DabFusion), utilizes a reference image and a music piece to generate dance videos featuring various dance types and choreographies. The music is analyzed by our specially designed music encoder, which identifies essential features including dance style, movement, and rhythm. DabFusion excels in generating dance videos not only for individuals in the training dataset but also for any previously unseen person. This versatility stems from its approach of generating latent optical flow, which contains all necessary motion information to animate any person in the image. We evaluate DabFusion's performance using the AIST++ dataset, focusing on video quality, audio-video synchronization, and motion-music alignment. We propose a 2D Motion-Music Alignment Score (2D-MM Align), which builds on the Beat Alignment Score to more effectively evaluate motion-music alignment for this new task. Experiments show that our DabFusion establishes a solid baseline for this innovative task. Video results can be found on our project page: https://DabFusion.github.io.

Related papers

ChoreoMuse: Robust Music-to-Dance Video Generation with Style Transfer and Beat-Adherent Motion [10.21851621470535]
We introduce ChoreoMuse, a diffusion-based framework that uses SMPL format parameters and their variation version as intermediaries between music and video generation.<n>ChoreoMuse supports style-controllable, high-fidelity dance video generation across diverse musical genres and individual dancer characteristics.<n>Our method employs a novel music encoder MotionTune to capture motion cues from audio, ensuring that the generated choreography closely follows the beat and expressive qualities of the input music.
arXiv Detail & Related papers (2025-07-26T07:17:50Z)
How Animals Dance (When You're Not Looking) [50.76342313977405]
We present a framework for generating music-aware, aware animal dance videos.<n>With as few as six input diffusions, our method can produce up to 30 second dance videos.
arXiv Detail & Related papers (2025-05-29T17:58:02Z)
X-Dancer: Expressive Music to Human Dance Video Generation [26.544761204917336]
X-Dancer is a novel zero-shot music-driven image animation pipeline. It creates diverse and long-range lifelike human dance videos from a single static image.
arXiv Detail & Related papers (2025-02-24T18:47:54Z)
Every Image Listens, Every Image Dances: Music-Driven Image Animation [8.085267959520843]
MuseDance is an end-to-end model that animates reference images using both music and text inputs. Unlike existing approaches, MuseDance eliminates the need for complex motion guidance inputs, such as pose or depth sequences. We present a new multimodal dataset comprising 2,904 dance videos with corresponding background music and text descriptions.
arXiv Detail & Related papers (2025-01-30T23:38:51Z)
Controllable Dance Generation with Style-Guided Motion Diffusion [49.35282418951445]
Dance plays an important role as an artistic form and expression in human culture, yet the creation of dance remains a challenging task.<n>Most dance generation methods rely solely on music, seldom taking into consideration intrinsic attributes such as music style or genre.<n>In this work, we introduce Flexible Dance Generation with Style Description Prompts (DGSDP), a diffusion-based framework suitable for diversified tasks of dance generation.
arXiv Detail & Related papers (2024-06-12T04:55:14Z)
Automatic Dance Video Segmentation for Understanding Choreography [10.053913399613764]
We propose a method to automatically segment a dance video into each movement. To build our training dataset, we annotate segmentation points to dance videos in the AIST Dance Video Database. The evaluation study shows that the proposed method can estimate segmentation points with high accuracy.
arXiv Detail & Related papers (2024-05-30T06:19:01Z)
DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance [50.01162760878841]
We present DCM, a new multi-modal 3D dataset that combines camera movement with dance motion and music audio. This dataset encompasses 108 dance sequences (3.2 hours) of paired dance-camera-music data from the anime community. We propose DanceCamera3D, a transformer-based diffusion model that incorporates a novel body attention loss and a condition separation strategy.
arXiv Detail & Related papers (2024-03-20T15:24:57Z)
TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration [75.37311932218773]
We propose a novel task for generating 3D dance movements that simultaneously incorporate both text and music modalities. Our approach can generate realistic and coherent dance movements conditioned on both text and music while maintaining comparable performance with the two single modalities.
arXiv Detail & Related papers (2023-04-05T12:58:33Z)
Music-Driven Group Choreography [10.501572863039852]
$rm AIOZ-GDANCE$ is a new large-scale dataset for music-driven group dance generation. We show that naively applying single dance generation technique to creating group dance motion may lead to unsatisfactory results. We propose a new method that takes an input music sequence and a set of 3D positions of dancers to efficiently produce multiple group-coherent choreographies.
arXiv Detail & Related papers (2023-03-22T06:26:56Z)
BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis [123.73677487809418]
We introduce a new dataset aiming to challenge common assumptions in dance motion synthesis. We focus on breakdancing which features acrobatic moves and tangled postures. Our efforts produced the BRACE dataset, which contains over 3 hours and 30 minutes of densely annotated poses.
arXiv Detail & Related papers (2022-07-20T18:03:54Z)
Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic Memory [92.81383016482813]
We propose a novel music-to-dance framework, Bailando, for driving 3D characters to dance following a piece of music. We introduce an actor-critic Generative Pre-trained Transformer (GPT) that composes units to a fluent dance coherent to the music. Our proposed framework achieves state-of-the-art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-03-24T13:06:43Z)
Music-to-Dance Generation with Optimal Transport [48.92483627635586]
We propose a Music-to-Dance with Optimal Transport Network (MDOT-Net) for learning to generate 3D dance choreographs from music. We introduce an optimal transport distance for evaluating the authenticity of the generated dance distribution and a Gromov-Wasserstein distance to measure the correspondence between the dance distribution and the input music.
arXiv Detail & Related papers (2021-12-03T09:37:26Z)
DanceFormer: Music Conditioned 3D Dance Generation with Parametric Motion Transformer [23.51701359698245]
In this paper, we reformulate it by a two-stage process, ie, a key pose generation and then an in-between parametric motion curve prediction. We propose a large-scale music conditioned 3D dance dataset, called PhantomDance, that is accurately labeled by experienced animators. Experiments demonstrate that the proposed method, even trained by existing datasets, can generate fluent, performative, and music-matched 3D dances.
arXiv Detail & Related papers (2021-03-18T12:17:38Z)
DanceIt: Music-inspired Dancing Video Synthesis [38.87762996956861]
We propose to reproduce such an inherent capability of the human-being within a computer vision system. The proposed system consists of three modules. The generated dancing videos match the content and rhythm of the music.
arXiv Detail & Related papers (2020-09-17T02:29:13Z)
Learning to Generate Diverse Dance Motions with Transformer [67.43270523386185]
We introduce a complete system for dance motion synthesis. A massive dance motion data set is created from YouTube videos. A novel two-stream motion transformer generative model can generate motion sequences with high flexibility.
arXiv Detail & Related papers (2020-08-18T22:29:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.