Related papers: CoComposer: LLM Multi-agent Collaborative Music Composition

CoComposer: LLM Multi-agent Collaborative Music Composition

URL: http://arxiv.org/abs/2509.00132v1
Date: Fri, 29 Aug 2025 14:15:12 GMT
Title: CoComposer: LLM Multi-agent Collaborative Music Composition
Authors: Peiwen Xing, Aske Plaat, Niki van Stein,
Abstract summary: CoComposer is a multi-agent system that consists of five collaborating agents, each with a task based on the traditional music composition workflow.<n>We find CoComposer outperforms existing multi-agent LLM-based systems in music quality, and compared to a single-agent system, in production complexity.
Score: 0.6918455480131248
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing AI Music composition tools are limited in generation duration, musical quality, and controllability. We introduce CoComposer, a multi-agent system that consists of five collaborating agents, each with a task based on the traditional music composition workflow. Using the AudioBox-Aesthetics system, we experimentally evaluate CoComposer on four compositional criteria. We test with three LLMs (GPT-4o, DeepSeek-V3-0324, Gemini-2.5-Flash), and find (1) that CoComposer outperforms existing multi-agent LLM-based systems in music quality, and (2) compared to a single-agent system, in production complexity. Compared to non- LLM MusicLM, CoComposer has better interpretability and editability, although MusicLM still produces better music.

Related papers

AutoMV: An Automatic Multi-Agent System for Music Video Generation [49.29602419334139]
AutoMV is a multi-agent system that generates full music videos (MVs) directly from a song.<n>A benchmark was applied to compare commercial products, AutoMV, and human-directed MVs with expert human raters.
arXiv Detail & Related papers (2025-12-13T05:53:50Z)
LeVo: High-Quality Song Generation with Multi-Preference Alignment [49.94713419553945]
We introduce LeVo, an LM-based framework consisting of LeLM and a music accompaniment.<n>LeVo is capable of parallelly modeling two types of tokens: mixed tokens, which represent the combined audio of vocals and to achieve vocal-instrument harmony, and dual-track tokens, which separately encode vocals and accompaniment.<n> Experimental results demonstrate that LeVo consistently outperforms existing methods on both objective and subjective metrics.
arXiv Detail & Related papers (2025-06-09T07:57:24Z)
FilmComposer: LLM-Driven Music Production for Silent Film Clips [7.730834771348827]
We implement music production for silent film clips using LLM-driven method.<n>FilmComposer is the first to combine large generative models with a multi-agent approach.<n>MusicPro-7k includes 7,418 film clips, music, description, rhythm spots and main melody.
arXiv Detail & Related papers (2025-03-11T08:05:11Z)
DeepResonance: Enhancing Multimodal Music Understanding via Music-centric Multi-way Instruction Tuning [32.19855680723024]
DeepResonance is a multimodal music understanding model fine-tuned via multi-way instruction tuning.<n>We construct datasets designed to enable DeepResonance to integrate both visual and textual music feature content.<n>Our model achieves state-of-the-art performances across six music understanding tasks.
arXiv Detail & Related papers (2025-02-18T08:09:42Z)
ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems [80.69865295743149]
This work attempts to study using LLM-based agents to design collaborative AI systems autonomously.<n>Based on ComfyBench, we develop ComfyAgent, a framework that empowers agents to autonomously design collaborative AI systems by generating.<n>While ComfyAgent achieves a comparable resolve rate to o1-preview and significantly surpasses other agents on ComfyBench, ComfyAgent has resolved only 15% of creative tasks.
arXiv Detail & Related papers (2024-09-02T17:44:10Z)
ComposerX: Multi-Agent Symbolic Music Composition with LLMs [51.68908082829048]
Music composition is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. Current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and Chain-of-Thoughts. We propose ComposerX, an agent-based symbolic music generation framework.
arXiv Detail & Related papers (2024-04-28T06:17:42Z)
SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition [82.38021790213752]
SongComposer is a music-specialized large language model (LLM)<n>It integrates the capability of simultaneously composing melodies into LLMs by leveraging three key innovations.<n>It outperforms advanced LLMs in tasks such as lyric-to-melody generation, melody-to-lyric generation, song continuation, and text-to-song creation.<n>We will release SongCompose, a large-scale dataset for training, containing paired lyrics and melodies in Chinese and English.
arXiv Detail & Related papers (2024-02-27T16:15:28Z)
ChatMusician: Understanding and Generating Music Intrinsically with LLM [81.48629006702409]
ChatMusician is an open-source Large Language Models (LLMs) that integrates intrinsic musical abilities. It can understand and generate music with a pure text tokenizer without any external multi-modal neural structures or tokenizers. Our model is capable of composing well-structured, full-length music, conditioned on texts, chords, melodies, motifs, musical forms, etc.
arXiv Detail & Related papers (2024-02-25T17:19:41Z)
ByteComposer: a Human-like Melody Composition Method based on Language Model Agent [11.792129708566598]
Large Language Models (LLM) have shown encouraging progress in multimodal understanding and generation tasks. We propose ByteComposer, an agent framework emulating a human's creative pipeline in four separate steps. We conduct extensive experiments on GPT4 and several open-source large language models, which substantiate our framework's effectiveness.
arXiv Detail & Related papers (2024-02-24T04:35:07Z)
Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z)
ComMU: Dataset for Combinatorial Music Generation [20.762884001498627]
Combinatorial music generation creates short samples of music with rich musical metadata, and combines them to produce a complete music. ComMU is the first symbolic music dataset consisting of short music samples and their corresponding 12 musical metadata. Our results show that we can generate diverse high-quality music only with metadata, and that our unique metadata such as track-role and extended chord quality improves the capacity of the automatic composition.
arXiv Detail & Related papers (2022-11-17T07:25:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.