Learning Music-Dance Representations through Explicit-Implicit Rhythm
Synchronization
- URL: http://arxiv.org/abs/2207.03190v2
- Date: Thu, 10 Aug 2023 08:06:05 GMT
- Title: Learning Music-Dance Representations through Explicit-Implicit Rhythm
Synchronization
- Authors: Jiashuo Yu, Junfu Pu, Ying Cheng, Rui Feng, Ying Shan
- Abstract summary: Music-dance representation can be applied to three downstream tasks: (a) dance classification, (b) music-dance retrieval, and (c) music-dance.
We derive the dance rhythms based on visual appearance and motion cues inspired by the music rhythm analysis. Then the visual rhythms are temporally aligned with the music counterparts, which are extracted by the amplitude of sound intensity.
- Score: 22.279424952432677
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although audio-visual representation has been proved to be applicable in many
downstream tasks, the representation of dancing videos, which is more specific
and always accompanied by music with complex auditory contents, remains
challenging and uninvestigated. Considering the intrinsic alignment between the
cadent movement of dancer and music rhythm, we introduce MuDaR, a novel
Music-Dance Representation learning framework to perform the synchronization of
music and dance rhythms both in explicit and implicit ways. Specifically, we
derive the dance rhythms based on visual appearance and motion cues inspired by
the music rhythm analysis. Then the visual rhythms are temporally aligned with
the music counterparts, which are extracted by the amplitude of sound
intensity. Meanwhile, we exploit the implicit coherence of rhythms implied in
audio and visual streams by contrastive learning. The model learns the joint
embedding by predicting the temporal consistency between audio-visual pairs.
The music-dance representation, together with the capability of detecting audio
and visual rhythms, can further be applied to three downstream tasks: (a) dance
classification, (b) music-dance retrieval, and (c) music-dance retargeting.
Extensive experiments demonstrate that our proposed framework outperforms other
self-supervised methods by a large margin.
Related papers
- MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization [52.498942604622165]
This paper presents MuVi, a framework to generate music that aligns with video content.
MuVi analyzes video content through a specially designed visual adaptor to extract contextually and temporally relevant features.
We show that MuVi demonstrates superior performance in both audio quality and temporal synchronization.
arXiv Detail & Related papers (2024-10-16T18:44:56Z) - Quantized GAN for Complex Music Generation from Dance Videos [48.196705493763986]
We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates musical samples conditioned on dance videos.
Our proposed framework takes dance video frames and human body motion as input, and learns to generate music samples that plausibly accompany the corresponding input.
arXiv Detail & Related papers (2022-04-01T17:53:39Z) - Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic
Memory [92.81383016482813]
We propose a novel music-to-dance framework, Bailando, for driving 3D characters to dance following a piece of music.
We introduce an actor-critic Generative Pre-trained Transformer (GPT) that composes units to a fluent dance coherent to the music.
Our proposed framework achieves state-of-the-art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-03-24T13:06:43Z) - Dual Learning Music Composition and Dance Choreography [57.55406449959893]
Music and dance have always co-existed as pillars of human activities, contributing immensely to cultural, social, and entertainment functions.
Recent research works have studied generative models for dance sequences conditioned on music.
We propose a novel extension, where we jointly model both tasks in a dual learning approach.
arXiv Detail & Related papers (2022-01-28T09:20:28Z) - Music-to-Dance Generation with Optimal Transport [48.92483627635586]
We propose a Music-to-Dance with Optimal Transport Network (MDOT-Net) for learning to generate 3D dance choreographs from music.
We introduce an optimal transport distance for evaluating the authenticity of the generated dance distribution and a Gromov-Wasserstein distance to measure the correspondence between the dance distribution and the input music.
arXiv Detail & Related papers (2021-12-03T09:37:26Z) - DanceIt: Music-inspired Dancing Video Synthesis [38.87762996956861]
We propose to reproduce such an inherent capability of the human-being within a computer vision system.
The proposed system consists of three modules.
The generated dancing videos match the content and rhythm of the music.
arXiv Detail & Related papers (2020-09-17T02:29:13Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z) - Music2Dance: DanceNet for Music-driven Dance Generation [11.73506542921528]
We propose a novel autoregressive generative model, DanceNet, to take the style, rhythm and melody of music as the control signals.
We capture several synchronized music-dance pairs by professional dancers, and build a high-quality music-dance pair dataset.
arXiv Detail & Related papers (2020-02-02T17:18:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.