Diff-BGM: A Diffusion Model for Video Background Music Generation
- URL: http://arxiv.org/abs/2405.11913v1
- Date: Mon, 20 May 2024 09:48:36 GMT
- Title: Diff-BGM: A Diffusion Model for Video Background Music Generation
- Authors: Sizhe Li, Yiming Qin, Minghang Zheng, Xin Jin, Yang Liu,
- Abstract summary: We propose a high-quality music-video dataset with detailed annotation and shot detection to provide multi-modal information about the video and music.
We then present evaluation metrics to assess music quality, including music diversity and alignment between music and video.
We propose the Diff-BGM framework to automatically generate the background music for a given video, which uses different signals to control different aspects of the music during the generation process.
- Score: 16.94631443719866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When editing a video, a piece of attractive background music is indispensable. However, video background music generation tasks face several challenges, for example, the lack of suitable training datasets, and the difficulties in flexibly controlling the music generation process and sequentially aligning the video and music. In this work, we first propose a high-quality music-video dataset BGM909 with detailed annotation and shot detection to provide multi-modal information about the video and music. We then present evaluation metrics to assess music quality, including music diversity and alignment between music and video with retrieval precision metrics. Finally, we propose the Diff-BGM framework to automatically generate the background music for a given video, which uses different signals to control different aspects of the music during the generation process, i.e., uses dynamic video features to control music rhythm and semantic features to control the melody and atmosphere. We propose to align the video and music sequentially by introducing a segment-aware cross-attention layer. Experiments verify the effectiveness of our proposed method. The code and models are available at https://github.com/sizhelee/Diff-BGM.
Related papers
- MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization [52.498942604622165]
This paper presents MuVi, a framework to generate music that aligns with video content.
MuVi analyzes video content through a specially designed visual adaptor to extract contextually and temporally relevant features.
We show that MuVi demonstrates superior performance in both audio quality and temporal synchronization.
arXiv Detail & Related papers (2024-10-16T18:44:56Z) - VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos [32.741262543860934]
We present a framework for learning to generate background music from video inputs.
We develop a generative video-music Transformer with a novel semantic video-music alignment scheme.
New temporal video encoder architecture allows us to efficiently process videos consisting of many densely sampled frames.
arXiv Detail & Related papers (2024-09-11T17:56:48Z) - VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling [71.01050359126141]
We propose VidMuse, a framework for generating music aligned with video inputs.
VidMuse produces high-fidelity music that is both acoustically and semantically aligned with the video.
arXiv Detail & Related papers (2024-06-06T17:58:11Z) - Video2Music: Suitable Music Generation from Videos using an Affective
Multimodal Transformer model [32.801213106782335]
We develop a generative music AI framework, Video2Music, that can match a provided video.
In a thorough experiment, we show that our proposed framework can generate music that matches the video content in terms of emotion.
arXiv Detail & Related papers (2023-11-02T03:33:00Z) - GETMusic: Generating Any Music Tracks with a Unified Representation and
Diffusion Framework [58.64512825534638]
Symbolic music generation aims to create musical notes, which can help users compose music.
We introduce a framework known as GETMusic, with GET'' standing for GEnerate music Tracks''
GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time.
Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations.
arXiv Detail & Related papers (2023-05-18T09:53:23Z) - V2Meow: Meowing to the Visual Beat via Video-to-Music Generation [47.076283429992664]
V2Meow is a video-to-music generation system capable of producing high-quality music audio for a diverse range of video input types.
It synthesizes high-fidelity music audio waveforms solely by conditioning on pre-trained general-purpose visual features extracted from video frames.
arXiv Detail & Related papers (2023-05-11T06:26:41Z) - Video Background Music Generation: Dataset, Method and Evaluation [31.15901120245794]
We introduce a complete recipe including dataset, benchmark model, and evaluation metric for video background music generation.
We present SymMV, a video and symbolic music dataset with various musical annotations.
We also propose a benchmark video background music generation framework named V-MusProd.
arXiv Detail & Related papers (2022-11-21T08:39:48Z) - Lets Play Music: Audio-driven Performance Video Generation [58.77609661515749]
We propose a new task named Audio-driven Per-formance Video Generation (APVG)
APVG aims to synthesize the video of a person playing a certain instrument guided by a given music audio clip.
arXiv Detail & Related papers (2020-11-05T03:13:46Z) - Foley Music: Learning to Generate Music from Videos [115.41099127291216]
Foley Music is a system that can synthesize plausible music for a silent video clip about people playing musical instruments.
We first identify two key intermediate representations for a successful video to music generator: body keypoints from videos and MIDI events from audio recordings.
We present a Graph$-$Transformer framework that can accurately predict MIDI event sequences in accordance with the body movements.
arXiv Detail & Related papers (2020-07-21T17:59:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.