CoComposer: LLM Multi-agent Collaborative Music Composition
- URL: http://arxiv.org/abs/2509.00132v1
- Date: Fri, 29 Aug 2025 14:15:12 GMT
- Title: CoComposer: LLM Multi-agent Collaborative Music Composition
- Authors: Peiwen Xing, Aske Plaat, Niki van Stein,
- Abstract summary: CoComposer is a multi-agent system that consists of five collaborating agents, each with a task based on the traditional music composition workflow.<n>We find CoComposer outperforms existing multi-agent LLM-based systems in music quality, and compared to a single-agent system, in production complexity.
- Score: 0.6918455480131248
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing AI Music composition tools are limited in generation duration, musical quality, and controllability. We introduce CoComposer, a multi-agent system that consists of five collaborating agents, each with a task based on the traditional music composition workflow. Using the AudioBox-Aesthetics system, we experimentally evaluate CoComposer on four compositional criteria. We test with three LLMs (GPT-4o, DeepSeek-V3-0324, Gemini-2.5-Flash), and find (1) that CoComposer outperforms existing multi-agent LLM-based systems in music quality, and (2) compared to a single-agent system, in production complexity. Compared to non- LLM MusicLM, CoComposer has better interpretability and editability, although MusicLM still produces better music.
Related papers
- AutoMV: An Automatic Multi-Agent System for Music Video Generation [49.29602419334139]
AutoMV is a multi-agent system that generates full music videos (MVs) directly from a song.<n>A benchmark was applied to compare commercial products, AutoMV, and human-directed MVs with expert human raters.
arXiv Detail & Related papers (2025-12-13T05:53:50Z) - LeVo: High-Quality Song Generation with Multi-Preference Alignment [49.94713419553945]
We introduce LeVo, an LM-based framework consisting of LeLM and a music accompaniment.<n>LeVo is capable of parallelly modeling two types of tokens: mixed tokens, which represent the combined audio of vocals and to achieve vocal-instrument harmony, and dual-track tokens, which separately encode vocals and accompaniment.<n> Experimental results demonstrate that LeVo consistently outperforms existing methods on both objective and subjective metrics.
arXiv Detail & Related papers (2025-06-09T07:57:24Z) - FilmComposer: LLM-Driven Music Production for Silent Film Clips [7.730834771348827]
We implement music production for silent film clips using LLM-driven method.<n>FilmComposer is the first to combine large generative models with a multi-agent approach.<n>MusicPro-7k includes 7,418 film clips, music, description, rhythm spots and main melody.
arXiv Detail & Related papers (2025-03-11T08:05:11Z) - DeepResonance: Enhancing Multimodal Music Understanding via Music-centric Multi-way Instruction Tuning [32.19855680723024]
DeepResonance is a multimodal music understanding model fine-tuned via multi-way instruction tuning.<n>We construct datasets designed to enable DeepResonance to integrate both visual and textual music feature content.<n>Our model achieves state-of-the-art performances across six music understanding tasks.
arXiv Detail & Related papers (2025-02-18T08:09:42Z) - ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems [80.69865295743149]
This work attempts to study using LLM-based agents to design collaborative AI systems autonomously.<n>Based on ComfyBench, we develop ComfyAgent, a framework that empowers agents to autonomously design collaborative AI systems by generating.<n>While ComfyAgent achieves a comparable resolve rate to o1-preview and significantly surpasses other agents on ComfyBench, ComfyAgent has resolved only 15% of creative tasks.
arXiv Detail & Related papers (2024-09-02T17:44:10Z) - ComposerX: Multi-Agent Symbolic Music Composition with LLMs [51.68908082829048]
Music composition is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints.
Current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and Chain-of-Thoughts.
We propose ComposerX, an agent-based symbolic music generation framework.
arXiv Detail & Related papers (2024-04-28T06:17:42Z) - SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition [82.38021790213752]
SongComposer is a music-specialized large language model (LLM)<n>It integrates the capability of simultaneously composing melodies into LLMs by leveraging three key innovations.<n>It outperforms advanced LLMs in tasks such as lyric-to-melody generation, melody-to-lyric generation, song continuation, and text-to-song creation.<n>We will release SongCompose, a large-scale dataset for training, containing paired lyrics and melodies in Chinese and English.
arXiv Detail & Related papers (2024-02-27T16:15:28Z) - ChatMusician: Understanding and Generating Music Intrinsically with LLM [81.48629006702409]
ChatMusician is an open-source Large Language Models (LLMs) that integrates intrinsic musical abilities.
It can understand and generate music with a pure text tokenizer without any external multi-modal neural structures or tokenizers.
Our model is capable of composing well-structured, full-length music, conditioned on texts, chords, melodies, motifs, musical forms, etc.
arXiv Detail & Related papers (2024-02-25T17:19:41Z) - ByteComposer: a Human-like Melody Composition Method based on Language
Model Agent [11.792129708566598]
Large Language Models (LLM) have shown encouraging progress in multimodal understanding and generation tasks.
We propose ByteComposer, an agent framework emulating a human's creative pipeline in four separate steps.
We conduct extensive experiments on GPT4 and several open-source large language models, which substantiate our framework's effectiveness.
arXiv Detail & Related papers (2024-02-24T04:35:07Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - ComMU: Dataset for Combinatorial Music Generation [20.762884001498627]
Combinatorial music generation creates short samples of music with rich musical metadata, and combines them to produce a complete music.
ComMU is the first symbolic music dataset consisting of short music samples and their corresponding 12 musical metadata.
Our results show that we can generate diverse high-quality music only with metadata, and that our unique metadata such as track-role and extended chord quality improves the capacity of the automatic composition.
arXiv Detail & Related papers (2022-11-17T07:25:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.