Quantized GAN for Complex Music Generation from Dance Videos
- URL: http://arxiv.org/abs/2204.00604v1
- Date: Fri, 1 Apr 2022 17:53:39 GMT
- Title: Quantized GAN for Complex Music Generation from Dance Videos
- Authors: Ye Zhu, Kyle Olszewski, Yu Wu, Panos Achlioptas, Menglei Chai, Yan Yan
and Sergey Tulyakov
- Abstract summary: We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates musical samples conditioned on dance videos.
Our proposed framework takes dance video frames and human body motion as input, and learns to generate music samples that plausibly accompany the corresponding input.
- Score: 48.196705493763986
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal
framework that generates complex musical samples conditioned on dance videos.
Our proposed framework takes dance video frames and human body motion as input,
and learns to generate music samples that plausibly accompany the corresponding
input. Unlike most existing conditional music generation works that generate
specific types of mono-instrumental sounds using symbolic audio representations
(e.g., MIDI), and that heavily rely on pre-defined musical synthesizers, in
this work we generate dance music in complex styles (e.g., pop, breakdancing,
etc.) by employing a Vector Quantized (VQ) audio representation, and leverage
both its generality and the high abstraction capacity of its symbolic and
continuous counterparts. By performing an extensive set of experiments on
multiple datasets, and following a comprehensive evaluation protocol, we assess
the generative quality of our approach against several alternatives. The
quantitative results, which measure the music consistency, beats
correspondence, and music diversity, clearly demonstrate the effectiveness of
our proposed method. Last but not least, we curate a challenging dance-music
dataset of in-the-wild TikTok videos, which we use to further demonstrate the
efficacy of our approach in real-world applications - and which we hope to
serve as a starting point for relevant future research.
Related papers
- MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization [52.498942604622165]
This paper presents MuVi, a framework to generate music that aligns with video content.
MuVi analyzes video content through a specially designed visual adaptor to extract contextually and temporally relevant features.
We show that MuVi demonstrates superior performance in both audio quality and temporal synchronization.
arXiv Detail & Related papers (2024-10-16T18:44:56Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - QEAN: Quaternion-Enhanced Attention Network for Visual Dance Generation [6.060426136203966]
We propose a Quaternion-Enhanced Attention Network (QEAN) for visual dance synthesis from a quaternion perspective.
First, SPE embeds position information into self-attention in a rotational manner, leading to better learning of features of movement sequences and audio sequences.
Second, QRA represents and fuses 3D motion features and audio features in the form of a series of quaternions, enabling the model to better learn the temporal coordination of music and dance.
arXiv Detail & Related papers (2024-03-18T09:58:43Z) - Video2Music: Suitable Music Generation from Videos using an Affective
Multimodal Transformer model [32.801213106782335]
We develop a generative music AI framework, Video2Music, that can match a provided video.
In a thorough experiment, we show that our proposed framework can generate music that matches the video content in terms of emotion.
arXiv Detail & Related papers (2023-11-02T03:33:00Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - Long-Term Rhythmic Video Soundtracker [37.082768654951465]
We present Long-Term Rhythmic Video Soundtracker (LORIS), a novel framework to synthesize long-term conditional waveforms.
We extend our model's applicability from dances to multiple sports scenarios such as floor exercise and figure skating.
Our model generates long-term soundtracks with state-of-the-art musical quality and rhythmic correspondence.
arXiv Detail & Related papers (2023-05-02T10:58:29Z) - BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis [123.73677487809418]
We introduce a new dataset aiming to challenge common assumptions in dance motion synthesis.
We focus on breakdancing which features acrobatic moves and tangled postures.
Our efforts produced the BRACE dataset, which contains over 3 hours and 30 minutes of densely annotated poses.
arXiv Detail & Related papers (2022-07-20T18:03:54Z) - Music-to-Dance Generation with Optimal Transport [48.92483627635586]
We propose a Music-to-Dance with Optimal Transport Network (MDOT-Net) for learning to generate 3D dance choreographs from music.
We introduce an optimal transport distance for evaluating the authenticity of the generated dance distribution and a Gromov-Wasserstein distance to measure the correspondence between the dance distribution and the input music.
arXiv Detail & Related papers (2021-12-03T09:37:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.