DanceFusion: A Spatio-Temporal Skeleton Diffusion Transformer for Audio-Driven Dance Motion Reconstruction
- URL: http://arxiv.org/abs/2411.04646v1
- Date: Thu, 07 Nov 2024 12:11:11 GMT
- Title: DanceFusion: A Spatio-Temporal Skeleton Diffusion Transformer for Audio-Driven Dance Motion Reconstruction
- Authors: Li Zhao, Zhengmin Lu,
- Abstract summary: This paper introduces DanceFusion, a novel framework for reconstructing and generating dance movements synchronized to music.
The framework adeptly handles incomplete and noisy skeletal data common in short-form dance videos on social media platforms like TikTok.
Comprehensive evaluations demonstrate that DanceFusion surpasses existing methods, providing state-of-the-art performance in generating dynamic, realistic, and stylistically diverse dance motions.
- Score: 3.2189686611762163
- License:
- Abstract: This paper introduces DanceFusion, a novel framework for reconstructing and generating dance movements synchronized to music, utilizing a Spatio-Temporal Skeleton Diffusion Transformer. The framework adeptly handles incomplete and noisy skeletal data common in short-form dance videos on social media platforms like TikTok. DanceFusion incorporates a hierarchical Transformer-based Variational Autoencoder (VAE) integrated with a diffusion model, significantly enhancing motion realism and accuracy. Our approach introduces sophisticated masking techniques and a unique iterative diffusion process that refines the motion sequences, ensuring high fidelity in both motion generation and synchronization with accompanying audio cues. Comprehensive evaluations demonstrate that DanceFusion surpasses existing methods, providing state-of-the-art performance in generating dynamic, realistic, and stylistically diverse dance motions. Potential applications of this framework extend to content creation, virtual reality, and interactive entertainment, promising substantial advancements in automated dance generation. Visit our project page at https://th-mlab.github.io/DanceFusion/.
Related papers
- DanceCamAnimator: Keyframe-Based Controllable 3D Dance Camera Synthesis [49.614150163184064]
Dance camera movements involve both continuous sequences of variable lengths and sudden changes to simulate the switching of multiple cameras.
We propose to integrate cinematography knowledge by formulating this task as a three-stage process: animator detection, synthesis, and tween function prediction.
Following this formulation, we design a novel end-to-end dance camera framework textbfDanceCamAnimator, which imitates human animation procedures and shows powerful-based controllability with variable lengths.
arXiv Detail & Related papers (2024-09-23T11:20:44Z) - Bidirectional Autoregressive Diffusion Model for Dance Generation [26.449135437337034]
We propose a Bidirectional Autoregressive Diffusion Model (BADM) for music-to-dance generation.
A bidirectional encoder is built to enforce that the generated dance is harmonious in both the forward and backward directions.
To make the generated dance motion smoother, a local information decoder is built for local motion enhancement.
arXiv Detail & Related papers (2024-02-06T19:42:18Z) - DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation [89.50310360658791]
We present a novel cascaded motion diffusion model, DiffDance, designed for high-resolution, long-form dance generation.
This model comprises a music-to-dance diffusion model and a sequence super-resolution diffusion model.
We demonstrate that DiffDance is capable of generating realistic dance sequences that align effectively with the input music.
arXiv Detail & Related papers (2023-08-05T16:18:57Z) - TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration [75.37311932218773]
We propose a novel task for generating 3D dance movements that simultaneously incorporate both text and music modalities.
Our approach can generate realistic and coherent dance movements conditioned on both text and music while maintaining comparable performance with the two single modalities.
arXiv Detail & Related papers (2023-04-05T12:58:33Z) - Dance Style Transfer with Cross-modal Transformer [17.216186480300756]
CycleDance is a dance style transfer system to transform an existing motion clip in one dance style to a motion clip in another dance style.
Our method extends an existing CycleGAN architecture for modeling audio sequences and integrates multimodal transformer encoders to account for music context.
arXiv Detail & Related papers (2022-08-19T15:48:30Z) - BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis [123.73677487809418]
We introduce a new dataset aiming to challenge common assumptions in dance motion synthesis.
We focus on breakdancing which features acrobatic moves and tangled postures.
Our efforts produced the BRACE dataset, which contains over 3 hours and 30 minutes of densely annotated poses.
arXiv Detail & Related papers (2022-07-20T18:03:54Z) - Music-to-Dance Generation with Optimal Transport [48.92483627635586]
We propose a Music-to-Dance with Optimal Transport Network (MDOT-Net) for learning to generate 3D dance choreographs from music.
We introduce an optimal transport distance for evaluating the authenticity of the generated dance distribution and a Gromov-Wasserstein distance to measure the correspondence between the dance distribution and the input music.
arXiv Detail & Related papers (2021-12-03T09:37:26Z) - Transflower: probabilistic autoregressive dance generation with
multimodal attention [31.308435764603658]
We present a novel probabilistic autoregressive architecture that models the distribution over future poses with a normalizing flow conditioned on previous poses as well as music context.
Second, we introduce the currently largest 3D dance-motion dataset, obtained with a variety of motion-capture technologies, and including both professional and casual dancers.
arXiv Detail & Related papers (2021-06-25T20:14:28Z) - Learning to Generate Diverse Dance Motions with Transformer [67.43270523386185]
We introduce a complete system for dance motion synthesis.
A massive dance motion data set is created from YouTube videos.
A novel two-stream motion transformer generative model can generate motion sequences with high flexibility.
arXiv Detail & Related papers (2020-08-18T22:29:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.