ByteComposer: a Human-like Melody Composition Method based on Language
Model Agent
- URL: http://arxiv.org/abs/2402.17785v2
- Date: Thu, 7 Mar 2024 00:32:27 GMT
- Title: ByteComposer: a Human-like Melody Composition Method based on Language
Model Agent
- Authors: Xia Liang, Xingjian Du, Jiaju Lin, Pei Zou, Yuan Wan, Bilei Zhu
- Abstract summary: Large Language Models (LLM) have shown encouraging progress in multimodal understanding and generation tasks.
We propose ByteComposer, an agent framework emulating a human's creative pipeline in four separate steps.
We conduct extensive experiments on GPT4 and several open-source large language models, which substantiate our framework's effectiveness.
- Score: 11.792129708566598
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large Language Models (LLM) have shown encouraging progress in multimodal
understanding and generation tasks. However, how to design a human-aligned and
interpretable melody composition system is still under-explored. To solve this
problem, we propose ByteComposer, an agent framework emulating a human's
creative pipeline in four separate steps : "Conception Analysis - Draft
Composition - Self-Evaluation and Modification - Aesthetic Selection". This
framework seamlessly blends the interactive and knowledge-understanding
features of LLMs with existing symbolic music generation models, thereby
achieving a melody composition agent comparable to human creators. We conduct
extensive experiments on GPT4 and several open-source large language models,
which substantiate our framework's effectiveness. Furthermore, professional
music composers were engaged in multi-dimensional evaluations, the final
results demonstrated that across various facets of music composition,
ByteComposer agent attains the level of a novice melody composer.
Related papers
- Agent-Driven Large Language Models for Mandarin Lyric Generation [2.2221991003992967]
In tonal contour languages like Mandarin, pitch contours are influenced by both melody and tone, leading to variations in lyric-melody fit.
Our study confirms that lyricists and melody writers consider this fit during their composition process.
In this research, we developed a multi-agent system that decomposes the melody-to-lyric task into sub-tasks, with each agent controlling rhyme, syllable count, lyric-melody alignment, and consistency.
arXiv Detail & Related papers (2024-10-02T12:01:32Z) - MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - ComposerX: Multi-Agent Symbolic Music Composition with LLMs [51.68908082829048]
Music composition is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints.
Current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and Chain-of-Thoughts.
We propose ComposerX, an agent-based symbolic music generation framework.
arXiv Detail & Related papers (2024-04-28T06:17:42Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - Interactive Melody Generation System for Enhancing the Creativity of
Musicians [0.0]
This study proposes a system designed to enumerate the process of collaborative composition among humans.
By integrating multiple Recurrent Neural Network (RNN) models, the system provides an experience akin to collaborating with several composers.
arXiv Detail & Related papers (2024-03-06T01:33:48Z) - SongComposer: A Large Language Model for Lyric and Melody Composition in
Song Generation [88.33522730306674]
SongComposer could understand and generate melodies and lyrics in symbolic song representations.
We resort to symbolic song representation, the mature and efficient way humans designed for music.
With extensive experiments, SongComposer demonstrates superior performance in lyric-to-melody generation, melody-to-lyric generation, song continuation, and text-to-song creation.
arXiv Detail & Related papers (2024-02-27T16:15:28Z) - Symphony Generation with Permutation Invariant Language Model [57.75739773758614]
We present a symbolic symphony music generation solution, SymphonyNet, based on a permutation invariant language model.
A novel transformer decoder architecture is introduced as backbone for modeling extra-long sequences of symphony tokens.
Our empirical results show that our proposed approach can generate coherent, novel, complex and harmonious symphony compared to human composition.
arXiv Detail & Related papers (2022-05-10T13:08:49Z) - Expressive Communication: A Common Framework for Evaluating Developments
in Generative Models and Steering Interfaces [1.2891210250935146]
This study investigates how developments in both models and user interfaces are important for empowering co-creation.
In an evaluation study with 26 composers creating 100+ pieces of music and listeners providing 1000+ head-to-head comparisons, we find that more expressive models and more steerable interfaces are important.
arXiv Detail & Related papers (2021-11-29T20:57:55Z) - Music Composition with Deep Learning: A Review [1.7188280334580197]
We analyze the ability of current Deep Learning models to generate music with creativity.
We compare these models to the music composition process from a theoretical point of view.
arXiv Detail & Related papers (2021-08-27T13:53:53Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.