Related papers: Melody Is All You Need For Music Generation

Melody Is All You Need For Music Generation

URL: http://arxiv.org/abs/2409.20196v2
Date: Thu, 3 Oct 2024 07:39:20 GMT
Title: Melody Is All You Need For Music Generation
Authors: Shaopeng Wei, Manzhen Wei, Haoyu Wang, Yu Zhao, Gang Kou,
Abstract summary: We present the Melody Guided Music Generation (MMGen) model, the first novel approach using melody to guide the music generation. Specifically, we first align the melody with audio waveforms and their associated descriptions using the multimodal alignment module. This allows MMGen to generate music that matches the style of the provided audio while also producing music that reflects the content of the given text description.
Score: 10.366088659024685
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We present the Melody Guided Music Generation (MMGen) model, the first novel approach using melody to guide the music generation that, despite a pretty simple method and extremely limited resources, achieves excellent performance. Specifically, we first align the melody with audio waveforms and their associated descriptions using the multimodal alignment module. Subsequently, we condition the diffusion module on the learned melody representations. This allows MMGen to generate music that matches the style of the provided audio while also producing music that reflects the content of the given text description. To address the scarcity of high-quality data, we construct a multi-modal dataset, MusicSet, which includes melody, text, and audio, and will be made publicly available. We conduct extensive experiments which demonstrate the superiority of the proposed model both in terms of experimental metrics and actual performance quality.

Related papers

MusicFlow: Cascaded Flow Matching for Text Guided Music Generation [53.63948108922333]
MusicFlow is a cascaded text-to-music generation model based on flow matching. We leverage masked prediction as the training objective, enabling the model to generalize to other tasks such as music infilling and continuation.
arXiv Detail & Related papers (2024-10-27T15:35:41Z)
Accompanied Singing Voice Synthesis with Fully Text-controlled Melody [61.147446955297625]
Text-to-song (TTSong) is a music generation task that synthesizes accompanied singing voices. We present MelodyLM, the first TTSong model that generates high-quality song pieces with fully text-controlled melodies.
arXiv Detail & Related papers (2024-07-02T08:23:38Z)
MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music. To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation) Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z)
Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model [32.801213106782335]
We develop a generative music AI framework, Video2Music, that can match a provided video. In a thorough experiment, we show that our proposed framework can generate music that matches the video content in terms of emotion.
arXiv Detail & Related papers (2023-11-02T03:33:00Z)
Content-based Controls For Music Large Language Modeling [6.17674772485321]
Coco-Mulla is a content-based control method for music large language modeling. It uses a parameter-efficient fine-tuning (PEFT) method tailored for Transformer-based audio models. Our approach achieves high-quality music generation with low-resource semi-supervised learning.
arXiv Detail & Related papers (2023-10-26T05:24:38Z)
InstructME: An Instruction Guided Music Edit And Remix Framework with Latent Diffusion Models [42.2977676825086]
In this paper, we develop InstructME, an Instruction guided Music Editing and remixing framework based on latent diffusion models. Our framework fortifies the U-Net with multi-scale aggregation in order to maintain consistency before and after editing. Our proposed method significantly surpasses preceding systems in music quality, text relevance and harmony.
arXiv Detail & Related papers (2023-08-28T07:11:42Z)
Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z)
Unsupervised Melody-to-Lyric Generation [91.29447272400826]
We propose a method for generating high-quality lyrics without training on any aligned melody-lyric data. We leverage the segmentation and rhythm alignment between melody and lyrics to compile the given melody into decoding constraints. Our model can generate high-quality lyrics that are more on-topic, singable, intelligible, and coherent than strong baselines.
arXiv Detail & Related papers (2023-05-30T17:20:25Z)
MusicLM: Generating Music From Text [24.465880798449735]
We introduce MusicLM, a model generating high-fidelity music from text descriptions. MusicLM casts the process of conditional music generation as a hierarchical sequence-to-sequence modeling task. Our experiments show that MusicLM outperforms previous systems both in audio quality and adherence to the text description.
arXiv Detail & Related papers (2023-01-26T18:58:53Z)
Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation [158.54649047794794]
Re-creation of Creations (ROC) is a new paradigm for lyric-to-melody generation. ROC achieves good lyric-melody feature alignment in lyric-to-melody generation.
arXiv Detail & Related papers (2022-08-11T08:44:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.