Long-Term Rhythmic Video Soundtracker
- URL: http://arxiv.org/abs/2305.01319v2
- Date: Tue, 30 May 2023 11:04:25 GMT
- Title: Long-Term Rhythmic Video Soundtracker
- Authors: Jiashuo Yu, Yaohui Wang, Xinyuan Chen, Xiao Sun, Yu Qiao
- Abstract summary: We present Long-Term Rhythmic Video Soundtracker (LORIS), a novel framework to synthesize long-term conditional waveforms.
We extend our model's applicability from dances to multiple sports scenarios such as floor exercise and figure skating.
Our model generates long-term soundtracks with state-of-the-art musical quality and rhythmic correspondence.
- Score: 37.082768654951465
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of generating musical soundtracks in sync with
rhythmic visual cues. Most existing works rely on pre-defined music
representations, leading to the incompetence of generative flexibility and
complexity. Other methods directly generating video-conditioned waveforms
suffer from limited scenarios, short lengths, and unstable generation quality.
To this end, we present Long-Term Rhythmic Video Soundtracker (LORIS), a novel
framework to synthesize long-term conditional waveforms. Specifically, our
framework consists of a latent conditional diffusion probabilistic model to
perform waveform synthesis. Furthermore, a series of context-aware conditioning
encoders are proposed to take temporal information into consideration for a
long-term generation. Notably, we extend our model's applicability from dances
to multiple sports scenarios such as floor exercise and figure skating. To
perform comprehensive evaluations, we establish a benchmark for rhythmic video
soundtracks including the pre-processed dataset, improved evaluation metrics,
and robust generative baselines. Extensive experiments show that our model
generates long-term soundtracks with state-of-the-art musical quality and
rhythmic correspondence. Codes are available at
\url{https://github.com/OpenGVLab/LORIS}.
Related papers
- MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation [19.878013881045817]
MusiConGen is a temporally-conditioned Transformer-based text-to-music model.
It integrates automatically-extracted rhythm and chords as the condition signal.
We show that MusiConGen can generate realistic backing track music that aligns well with the specified conditions.
arXiv Detail & Related papers (2024-07-21T05:27:53Z) - VideoComposer: Compositional Video Synthesis with Motion Controllability [52.4714732331632]
VideoComposer allows users to flexibly compose a video with textual conditions, spatial conditions, and more importantly temporal conditions.
We introduce the motion vector from compressed videos as an explicit control signal to provide guidance regarding temporal dynamics.
In addition, we develop a Spatio-Temporal Condition encoder (STC-encoder) that serves as a unified interface to effectively incorporate the spatial and temporal relations of sequential inputs.
arXiv Detail & Related papers (2023-06-03T06:29:02Z) - Towards Smooth Video Composition [59.134911550142455]
Video generation requires consistent and persistent frames with dynamic content over time.
This work investigates modeling the temporal relations for composing video with arbitrary length, from a few frames to even infinite, using generative adversarial networks (GANs)
We show that the alias-free operation for single image generation, together with adequately pre-learned knowledge, brings a smooth frame transition without compromising the per-frame quality.
arXiv Detail & Related papers (2022-12-14T18:54:13Z) - Quantized GAN for Complex Music Generation from Dance Videos [48.196705493763986]
We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates musical samples conditioned on dance videos.
Our proposed framework takes dance video frames and human body motion as input, and learns to generate music samples that plausibly accompany the corresponding input.
arXiv Detail & Related papers (2022-04-01T17:53:39Z) - Deep Performer: Score-to-Audio Music Performance Synthesis [30.95307878579825]
Deep Performer is a novel system for score-to-audio music performance synthesis.
Unlike speech, music often contains polyphony and long notes.
We show that our proposed model can synthesize music with clear polyphony and harmonic structures.
arXiv Detail & Related papers (2022-02-12T10:36:52Z) - Strumming to the Beat: Audio-Conditioned Contrastive Video Textures [112.6140796961121]
We introduce a non-parametric approach for infinite video texture synthesis using a representation learned via contrastive learning.
We take inspiration from Video Textures, which showed that plausible new videos could be generated from a single one by stitching its frames together in a novel yet consistent order.
Our model outperforms baselines on human perceptual scores, can handle a diverse range of input videos, and can combine semantic and audio-visual cues in order to synthesize videos that synchronize well with an audio signal.
arXiv Detail & Related papers (2021-04-06T17:24:57Z) - Lets Play Music: Audio-driven Performance Video Generation [58.77609661515749]
We propose a new task named Audio-driven Per-formance Video Generation (APVG)
APVG aims to synthesize the video of a person playing a certain instrument guided by a given music audio clip.
arXiv Detail & Related papers (2020-11-05T03:13:46Z) - Continuous Melody Generation via Disentangled Short-Term Representations
and Structural Conditions [14.786601824794369]
We present a model for composing melodies given a user specified symbolic scenario combined with a previous music context.
Our model is capable of generating long melodies by regarding 8-beat note sequences as basic units, and shares consistent rhythm pattern structure with another specific song.
Results show that the music generated by our model tends to have salient repetition structures, rich motives, and stable rhythm patterns.
arXiv Detail & Related papers (2020-02-05T06:23:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.