Foley Music: Learning to Generate Music from Videos
- URL: http://arxiv.org/abs/2007.10984v1
- Date: Tue, 21 Jul 2020 17:59:06 GMT
- Title: Foley Music: Learning to Generate Music from Videos
- Authors: Chuang Gan, Deng Huang, Peihao Chen, Joshua B. Tenenbaum, Antonio
Torralba
- Abstract summary: Foley Music is a system that can synthesize plausible music for a silent video clip about people playing musical instruments.
We first identify two key intermediate representations for a successful video to music generator: body keypoints from videos and MIDI events from audio recordings.
We present a Graph$-$Transformer framework that can accurately predict MIDI event sequences in accordance with the body movements.
- Score: 115.41099127291216
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce Foley Music, a system that can synthesize
plausible music for a silent video clip about people playing musical
instruments. We first identify two key intermediate representations for a
successful video to music generator: body keypoints from videos and MIDI events
from audio recordings. We then formulate music generation from videos as a
motion-to-MIDI translation problem. We present a Graph$-$Transformer framework
that can accurately predict MIDI event sequences in accordance with the body
movements. The MIDI event can then be converted to realistic music using an
off-the-shelf music synthesizer tool. We demonstrate the effectiveness of our
models on videos containing a variety of music performances. Experimental
results show that our model outperforms several existing systems in generating
music that is pleasant to listen to. More importantly, the MIDI representations
are fully interpretable and transparent, thus enabling us to perform music
editing flexibly. We encourage the readers to watch the demo video with audio
turned on to experience the results.
Related papers
- MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization [52.498942604622165]
This paper presents MuVi, a framework to generate music that aligns with video content.
MuVi analyzes video content through a specially designed visual adaptor to extract contextually and temporally relevant features.
We show that MuVi demonstrates superior performance in both audio quality and temporal synchronization.
arXiv Detail & Related papers (2024-10-16T18:44:56Z) - VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos [32.741262543860934]
We present a framework for learning to generate background music from video inputs.
We develop a generative video-music Transformer with a novel semantic video-music alignment scheme.
New temporal video encoder architecture allows us to efficiently process videos consisting of many densely sampled frames.
arXiv Detail & Related papers (2024-09-11T17:56:48Z) - Accompanied Singing Voice Synthesis with Fully Text-controlled Melody [61.147446955297625]
Text-to-song (TTSong) is a music generation task that synthesizes accompanied singing voices.
We present MelodyLM, the first TTSong model that generates high-quality song pieces with fully text-controlled melodies.
arXiv Detail & Related papers (2024-07-02T08:23:38Z) - VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling [71.01050359126141]
We propose VidMuse, a framework for generating music aligned with video inputs.
VidMuse produces high-fidelity music that is both acoustically and semantically aligned with the video.
arXiv Detail & Related papers (2024-06-06T17:58:11Z) - Diff-BGM: A Diffusion Model for Video Background Music Generation [16.94631443719866]
We propose a high-quality music-video dataset with detailed annotation and shot detection to provide multi-modal information about the video and music.
We then present evaluation metrics to assess music quality, including music diversity and alignment between music and video.
We propose the Diff-BGM framework to automatically generate the background music for a given video, which uses different signals to control different aspects of the music during the generation process.
arXiv Detail & Related papers (2024-05-20T09:48:36Z) - Video2Music: Suitable Music Generation from Videos using an Affective
Multimodal Transformer model [32.801213106782335]
We develop a generative music AI framework, Video2Music, that can match a provided video.
In a thorough experiment, we show that our proposed framework can generate music that matches the video content in terms of emotion.
arXiv Detail & Related papers (2023-11-02T03:33:00Z) - Quantized GAN for Complex Music Generation from Dance Videos [48.196705493763986]
We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates musical samples conditioned on dance videos.
Our proposed framework takes dance video frames and human body motion as input, and learns to generate music samples that plausibly accompany the corresponding input.
arXiv Detail & Related papers (2022-04-01T17:53:39Z) - Lets Play Music: Audio-driven Performance Video Generation [58.77609661515749]
We propose a new task named Audio-driven Per-formance Video Generation (APVG)
APVG aims to synthesize the video of a person playing a certain instrument guided by a given music audio clip.
arXiv Detail & Related papers (2020-11-05T03:13:46Z) - Using a Bi-directional LSTM Model with Attention Mechanism trained on
MIDI Data for Generating Unique Music [0.25559196081940677]
This paper proposes a bi-directional LSTM model with attention mechanism capable of generating similar type of music based on MIDI data.
The music generated by the model follows the theme/style of the music the model is trained on.
arXiv Detail & Related papers (2020-11-02T06:43:28Z) - Audeo: Audio Generation for a Silent Performance Video [17.705770346082023]
We present a novel system that gets as an input video frames of a musician playing the piano and generates the music for that video.
Our main aim in this work is to explore the plausibility of such a transformation and to identify cues and components able to carry the association of sounds with visual events.
arXiv Detail & Related papers (2020-06-23T00:58:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.