Learn to Dance with AIST++: Music Conditioned 3D Dance Generation
- URL: http://arxiv.org/abs/2101.08779v2
- Date: Tue, 2 Feb 2021 05:23:59 GMT
- Title: Learn to Dance with AIST++: Music Conditioned 3D Dance Generation
- Authors: Ruilong Li, Shan Yang, David A. Ross, Angjoo Kanazawa
- Abstract summary: We present a transformer-based learning framework for 3D dance generation conditioned on music.
We also propose a new dataset of paired 3D motion and music called AIST++, which we reconstruct from the AIST multi-view dance videos.
- Score: 28.623222697548456
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present a transformer-based learning framework for 3D dance
generation conditioned on music. We carefully design our network architecture
and empirically study the keys for obtaining qualitatively pleasing results.
The critical components include a deep cross-modal transformer, which well
learns the correlation between the music and dance motion; and the
full-attention with future-N supervision mechanism which is essential in
producing long-range non-freezing motion. In addition, we propose a new dataset
of paired 3D motion and music called AIST++, which we reconstruct from the AIST
multi-view dance videos. This dataset contains 1.1M frames of 3D dance motion
in 1408 sequences, covering 10 genres of dance choreographies and accompanied
with multi-view camera parameters. To our knowledge it is the largest dataset
of this kind. Rich experiments on AIST++ demonstrate our method produces much
better results than the state-of-the-art methods both qualitatively and
quantitatively.
Related papers
- DanceCamAnimator: Keyframe-Based Controllable 3D Dance Camera Synthesis [49.614150163184064]
Dance camera movements involve both continuous sequences of variable lengths and sudden changes to simulate the switching of multiple cameras.
We propose to integrate cinematography knowledge by formulating this task as a three-stage process: animator detection, synthesis, and tween function prediction.
Following this formulation, we design a novel end-to-end dance camera framework textbfDanceCamAnimator, which imitates human animation procedures and shows powerful-based controllability with variable lengths.
arXiv Detail & Related papers (2024-09-23T11:20:44Z) - MIDGET: Music Conditioned 3D Dance Generation [13.067687949642641]
We introduce a MusIc conditioned 3D Dance GEneraTion model, named MIDGET, to generate vibrant and highquality dances that match the music rhythm.
To tackle challenges in the field, we introduce three new components: 1) a pre-trained memory codebook based on the Motion VQ-VAE model to store different human pose codes, 2) employing Motion GPT model to generate pose codes with music and motion ablations, and 3) a simple framework for music feature extraction.
arXiv Detail & Related papers (2024-04-18T10:20:37Z) - DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance [50.01162760878841]
We present DCM, a new multi-modal 3D dataset that combines camera movement with dance motion and music audio.
This dataset encompasses 108 dance sequences (3.2 hours) of paired dance-camera-music data from the anime community.
We propose DanceCamera3D, a transformer-based diffusion model that incorporates a novel body attention loss and a condition separation strategy.
arXiv Detail & Related papers (2024-03-20T15:24:57Z) - QEAN: Quaternion-Enhanced Attention Network for Visual Dance Generation [6.060426136203966]
We propose a Quaternion-Enhanced Attention Network (QEAN) for visual dance synthesis from a quaternion perspective.
First, SPE embeds position information into self-attention in a rotational manner, leading to better learning of features of movement sequences and audio sequences.
Second, QRA represents and fuses 3D motion features and audio features in the form of a series of quaternions, enabling the model to better learn the temporal coordination of music and dance.
arXiv Detail & Related papers (2024-03-18T09:58:43Z) - TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration [75.37311932218773]
We propose a novel task for generating 3D dance movements that simultaneously incorporate both text and music modalities.
Our approach can generate realistic and coherent dance movements conditioned on both text and music while maintaining comparable performance with the two single modalities.
arXiv Detail & Related papers (2023-04-05T12:58:33Z) - BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis [123.73677487809418]
We introduce a new dataset aiming to challenge common assumptions in dance motion synthesis.
We focus on breakdancing which features acrobatic moves and tangled postures.
Our efforts produced the BRACE dataset, which contains over 3 hours and 30 minutes of densely annotated poses.
arXiv Detail & Related papers (2022-07-20T18:03:54Z) - Quantized GAN for Complex Music Generation from Dance Videos [48.196705493763986]
We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates musical samples conditioned on dance videos.
Our proposed framework takes dance video frames and human body motion as input, and learns to generate music samples that plausibly accompany the corresponding input.
arXiv Detail & Related papers (2022-04-01T17:53:39Z) - Unsupervised 3D Pose Estimation for Hierarchical Dance Video Recognition [13.289339907084424]
We propose a Hierarchical Dance Video Recognition framework (HDVR)
HDVR estimates 2D pose sequences, tracks dancers, and then simultaneously estimates corresponding 3D poses and 3D-to-2D imaging parameters.
From the estimated 3D pose sequence, HDVR extracts body part movements, and therefrom dance genre.
arXiv Detail & Related papers (2021-09-19T16:59:37Z) - DanceFormer: Music Conditioned 3D Dance Generation with Parametric
Motion Transformer [23.51701359698245]
In this paper, we reformulate it by a two-stage process, ie, a key pose generation and then an in-between parametric motion curve prediction.
We propose a large-scale music conditioned 3D dance dataset, called PhantomDance, that is accurately labeled by experienced animators.
Experiments demonstrate that the proposed method, even trained by existing datasets, can generate fluent, performative, and music-matched 3D dances.
arXiv Detail & Related papers (2021-03-18T12:17:38Z) - Learning to Generate Diverse Dance Motions with Transformer [67.43270523386185]
We introduce a complete system for dance motion synthesis.
A massive dance motion data set is created from YouTube videos.
A novel two-stream motion transformer generative model can generate motion sequences with high flexibility.
arXiv Detail & Related papers (2020-08-18T22:29:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.