MAGMA: Music Aligned Generative Motion Autodecoder
- URL: http://arxiv.org/abs/2309.01202v1
- Date: Sun, 3 Sep 2023 15:21:47 GMT
- Title: MAGMA: Music Aligned Generative Motion Autodecoder
- Authors: Sohan Anisetty, Amit Raj, James Hays
- Abstract summary: We introduce a 2-step approach for generating dance using a Vector Quantized-Variational Autoencoder (VQ-VAE)
We also evaluate the importance of music representations by comparing naive music feature extraction using Librosa to deep audio representations generated by state-of-the-art audio compression algorithms.
Our proposed approach achieve state-of-the-art results in music-to-motion generation benchmarks and enables the real-time generation of considerably longer motion sequences.
- Score: 15.825872274297735
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Mapping music to dance is a challenging problem that requires spatial and
temporal coherence along with a continual synchronization with the music's
progression. Taking inspiration from large language models, we introduce a
2-step approach for generating dance using a Vector Quantized-Variational
Autoencoder (VQ-VAE) to distill motion into primitives and train a Transformer
decoder to learn the correct sequencing of these primitives. We also evaluate
the importance of music representations by comparing naive music feature
extraction using Librosa to deep audio representations generated by
state-of-the-art audio compression algorithms. Additionally, we train
variations of the motion generator using relative and absolute positional
encodings to determine the effect on generated motion quality when generating
arbitrarily long sequence lengths. Our proposed approach achieve
state-of-the-art results in music-to-motion generation benchmarks and enables
the real-time generation of considerably longer motion sequences, the ability
to chain multiple motion sequences seamlessly, and easy customization of motion
sequences to meet style requirements.
Related papers
- MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization [52.498942604622165]
This paper presents MuVi, a framework to generate music that aligns with video content.
MuVi analyzes video content through a specially designed visual adaptor to extract contextually and temporally relevant features.
We show that MuVi demonstrates superior performance in both audio quality and temporal synchronization.
arXiv Detail & Related papers (2024-10-16T18:44:56Z) - DiffusionPhase: Motion Diffusion in Frequency Domain [69.811762407278]
We introduce a learning-based method for generating high-quality human motion sequences from text descriptions.
Existing techniques struggle with motion diversity and smooth transitions in generating arbitrary-length motion sequences.
We develop a network encoder that converts the motion space into a compact yet expressive parameterized phase space.
arXiv Detail & Related papers (2023-12-07T04:39:22Z) - StyleInV: A Temporal Style Modulated Inversion Network for Unconditional
Video Generation [73.54398908446906]
We introduce a novel motion generator design that uses a learning-based inversion network for GAN.
Our method supports style transfer with simple fine-tuning when the encoder is paired with a pretrained StyleGAN generator.
arXiv Detail & Related papers (2023-08-31T17:59:33Z) - TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration [75.37311932218773]
We propose a novel task for generating 3D dance movements that simultaneously incorporate both text and music modalities.
Our approach can generate realistic and coherent dance movements conditioned on both text and music while maintaining comparable performance with the two single modalities.
arXiv Detail & Related papers (2023-04-05T12:58:33Z) - Music-driven Dance Regeneration with Controllable Key Pose Constraints [17.05495504855978]
We propose a novel framework for music-driven dance motion synthesis with controllable key pose constraint.
Our model involves two single-modal transformer encoders for music and motion representations and a cross-modal transformer decoder for dance motions generation.
arXiv Detail & Related papers (2022-07-08T04:26:45Z) - Quantized GAN for Complex Music Generation from Dance Videos [48.196705493763986]
We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates musical samples conditioned on dance videos.
Our proposed framework takes dance video frames and human body motion as input, and learns to generate music samples that plausibly accompany the corresponding input.
arXiv Detail & Related papers (2022-04-01T17:53:39Z) - MuseMorphose: Full-Song and Fine-Grained Music Style Transfer with Just
One Transformer VAE [36.9033909878202]
Transformer and variational autoencoders (VAE) have been extensively employed for symbolic (e.g., MIDI) domain music generation.
In this paper, we are interested in bringing the two together to construct a single model that exhibits both strengths.
Experiments show that MuseMorphose outperforms recurrent neural network (RNN) based prior art on numerous widely-used metrics for style transfer tasks.
arXiv Detail & Related papers (2021-05-10T03:44:03Z) - DanceFormer: Music Conditioned 3D Dance Generation with Parametric
Motion Transformer [23.51701359698245]
In this paper, we reformulate it by a two-stage process, ie, a key pose generation and then an in-between parametric motion curve prediction.
We propose a large-scale music conditioned 3D dance dataset, called PhantomDance, that is accurately labeled by experienced animators.
Experiments demonstrate that the proposed method, even trained by existing datasets, can generate fluent, performative, and music-matched 3D dances.
arXiv Detail & Related papers (2021-03-18T12:17:38Z) - Learning to Generate Diverse Dance Motions with Transformer [67.43270523386185]
We introduce a complete system for dance motion synthesis.
A massive dance motion data set is created from YouTube videos.
A novel two-stream motion transformer generative model can generate motion sequences with high flexibility.
arXiv Detail & Related papers (2020-08-18T22:29:40Z) - Learning Style-Aware Symbolic Music Representations by Adversarial
Autoencoders [9.923470453197657]
We focus on leveraging adversarial regularization as a flexible and natural mean to imbue variational autoencoders with context information.
We introduce the first Music Adversarial Autoencoder (MusAE)
Our model has a higher reconstruction accuracy than state-of-the-art models based on standard variational autoencoders.
arXiv Detail & Related papers (2020-01-15T18:07:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.