Melody Is All You Need For Music Generation
- URL: http://arxiv.org/abs/2409.20196v3
- Date: Mon, 25 Nov 2024 13:43:20 GMT
- Title: Melody Is All You Need For Music Generation
- Authors: Shaopeng Wei, Manzhen Wei, Haoyu Wang, Yu Zhao, Gang Kou,
- Abstract summary: We present the Melody Guided Music Generation (MG2) model, a novel approach using melody to guide the text-to-music generation.
The proposed MG2 model surpasses current open-source text-to-music generation models, utilizing fewer than 1/3 of the parameters and less than 1/200 of the training data.
- Score: 10.366088659024685
- License:
- Abstract: We present the Melody Guided Music Generation (MG2) model, a novel approach using melody to guide the text-to-music generation that, despite a pretty simple method and extremely limited resources, achieves excellent performance. Specifically, we first align the text with audio waveforms and their associated melodies using the newly proposed Contrastive Language-Music Pretraining, enabling the learned text representation fused with implicit melody information. Subsequently, we condition the retrieval-augmented diffusion module on both text prompt and retrieved melody. This allows MG2to generate music that reflects the content of the given text description, meantime keeping the intrinsic harmony under the guidance of explicit melody information. We conducted extensive experiments on two public datasets: MusicCaps and MusicBench. The experimental results demonstrate that the proposed MG2 model surpasses current open-source text-to-music generation models, utilizing fewer than 1/3 of the parameters and less than 1/200 of the training data compared to state-of-the-art counterparts. Furthermore, we carried out comprehensive human evaluations to explore the potential applications of MG2 in real-world scenarios.
Related papers
- MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - LOAF-M2L: Joint Learning of Wording and Formatting for Singable Melody-to-Lyric Generation [7.102743887290909]
This paper bridges the singability gap with a novel approach to generating singable lyrics by jointly Learning wOrding And during Melody-to-Lyric training.
After general-domain pretraining, our proposed model acquires length awareness first from a large text-only lyric corpus.
Then, we introduce a new objective informed by musicological research on the relationship between melody and lyrics during melody-to-lyric training, which enables the model to learn the fine-grained format requirements of the melody.
arXiv Detail & Related papers (2023-07-05T09:42:47Z) - Controllable Lyrics-to-Melody Generation [14.15838552524433]
We propose a controllable lyrics-to-melody generation network, ConL2M, which is able to generate realistic melodies from lyrics in user-desired musical style.
Our work contains three main novelties: 1) To model the dependencies of music attributes cross multiple sequences, inter-branch memory fusion (Memofu) is proposed to enable information flow between multi-branch stacked LSTM architecture; 2) Reference style embedding (RSE) is proposed to improve the quality of generation as well as control the musical style of generated melodies; 3) Sequence-level statistical loss (SeqLoss) is proposed to help the model learn sequence-level
arXiv Detail & Related papers (2023-06-05T06:14:08Z) - Unsupervised Melody-to-Lyric Generation [91.29447272400826]
We propose a method for generating high-quality lyrics without training on any aligned melody-lyric data.
We leverage the segmentation and rhythm alignment between melody and lyrics to compile the given melody into decoding constraints.
Our model can generate high-quality lyrics that are more on-topic, singable, intelligible, and coherent than strong baselines.
arXiv Detail & Related papers (2023-05-30T17:20:25Z) - Unsupervised Melody-Guided Lyrics Generation [84.22469652275714]
We propose to generate pleasantly listenable lyrics without training on melody-lyric aligned data.
We leverage the crucial alignments between melody and lyrics and compile the given melody into constraints to guide the generation process.
arXiv Detail & Related papers (2023-05-12T20:57:20Z) - Noise2Music: Text-conditioned Music Generation with Diffusion Models [73.74580231353684]
We introduce Noise2Music, where a series of diffusion models is trained to generate high-quality 30-second music clips from text prompts.
We find that the generated audio is not only able to faithfully reflect key elements of the text prompt such as genre, tempo, instruments, mood, and era.
Pretrained large language models play a key role in this story -- they are used to generate paired text for the audio of the training set and to extract embeddings of the text prompts ingested by the diffusion models.
arXiv Detail & Related papers (2023-02-08T07:27:27Z) - Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation [158.54649047794794]
Re-creation of Creations (ROC) is a new paradigm for lyric-to-melody generation.
ROC achieves good lyric-melody feature alignment in lyric-to-melody generation.
arXiv Detail & Related papers (2022-08-11T08:44:47Z) - TeleMelody: Lyric-to-Melody Generation with a Template-Based Two-Stage
Method [92.36505210982648]
TeleMelody is a two-stage lyric-to-melody generation system with music template.
It generates melodies with higher quality, better controllability, and less requirement on paired lyric-melody data.
arXiv Detail & Related papers (2021-09-20T15:19:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.