Related papers: How Far Can Pretrained LLMs Go in Symbolic Music? Controlled Comparisons of Supervised and Preference-based Adaptation

How Far Can Pretrained LLMs Go in Symbolic Music? Controlled Comparisons of Supervised and Preference-based Adaptation

URL: http://arxiv.org/abs/2601.22764v1
Date: Fri, 30 Jan 2026 09:44:01 GMT
Title: How Far Can Pretrained LLMs Go in Symbolic Music? Controlled Comparisons of Supervised and Preference-based Adaptation
Authors: Deepak Kumar, Emmanouil Karystinaios, Gerhard Widmer, Markus Schedl,
Abstract summary: Music often shares notable parallels with language, motivating the use of pretrained large language models (LLMs) for symbolic music understanding and generation.<n>We present a comparative study of finetuning strategies for ABC-based generation and understanding, comparing an off-the-shelf instruction-tuned backbone to domain-adapted variants.<n>We highlight the domain adaptation vs.preserving prior information tradeoff as well as the distinct behaviour of metrics used to measure the domain adaptation for symbolic music.
Score: 15.849579727945153
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Music often shares notable parallels with language, motivating the use of pretrained large language models (LLMs) for symbolic music understanding and generation. Despite growing interest, the practical effectiveness of adapting instruction-tuned LLMs to symbolic music remains insufficiently characterized. We present a controlled comparative study of finetuning strategies for ABC-based generation and understanding, comparing an off-the-shelf instruction-tuned backbone to domain-adapted variants and a music-specialized LLM baseline. Across multiple symbolic music corpora and evaluation signals, we provide some insights into adaptation choices for symbolic music applications. We highlight the domain adaptation vs.~preserving prior information tradeoff as well as the distinct behaviour of metrics used to measure the domain adaptation for symbolic music.

Related papers

Neurosymbolic LoRA: Why and When to Tune Weights vs. Rewrite Prompts [60.59428237500969]
Large language models (LLMs) can be adapted either through numerical updates that alter model parameters or symbolic manipulations that work on discrete prompts or logical constraints.<n>We introduce a neurosymbolic LoRA framework that combines numerical and symbolic updates.<n>Our findings highlight the value of interleaving numerical and symbolic updates to unlock a new level of versatility in language model fine-tuning.
arXiv Detail & Related papers (2026-01-19T04:24:49Z)
ABC-Eval: Benchmarking Large Language Models on Symbolic Music Understanding and Instruction Following [8.668922435342054]
We propose ABC-Eval, the first open-source benchmark dedicated to the understanding and instruction-following capabilities in text-based ABC notation scores.<n>It comprises 1,086 test samples spanning 10 sub-tasks, covering scenarios from basic musical syntax comprehension to complex sequence-level reasoning.<n>We evaluate seven state-of-the-art LLMs on ABC-Eval, and the results reveal notable limitations in existing models' symbolic music processing capabilities.
arXiv Detail & Related papers (2025-09-27T14:56:20Z)
Towards an AI Musician: Synthesizing Sheet Music Problems for Musical Reasoning [69.78158549955384]
We introduce a novel approach that treats core music theory rules, such as those governing beats and intervals, as programmatic functions.<n>This approach generates verifiable sheet music questions in both textual and visual modalities.<n> Evaluation results on SSMR-Bench highlight the key role reasoning plays in interpreting sheet music.
arXiv Detail & Related papers (2025-09-04T09:42:17Z)
MUST-RAG: MUSical Text Question Answering with Retrieval Augmented Generation [6.903890310699392]
MusT-RAG is a comprehensive framework based on Retrieval Augmented Generation (RAG)<n>MusWikiDB is a music-specialized vector database for the retrieval stage.<n>Our experiment demonstrates that MusT-RAG significantly outperforms traditional fine-tuning approaches in enhancing LLMs' music domain adaptation capabilities.
arXiv Detail & Related papers (2025-07-31T08:31:05Z)
Large Language Models' Internal Perception of Symbolic Music [3.9901365062418317]
Large language models (LLMs) excel at modeling relationships between strings in natural language.<n>This paper investigates how LLMs represent musical concepts by generating symbolic music data from textual prompts.
arXiv Detail & Related papers (2025-07-17T05:48:45Z)
LeVo: High-Quality Song Generation with Multi-Preference Alignment [47.965028296133426]
We introduce LeVo, a language model based framework consisting of LeLM and Music Codec.<n>LeVo is capable of parallel modeling of two types of tokens: mixed tokens, which represent the combined audio of vocals and accompaniment.<n>It employs two decoder-only transformers and a modular extension training strategy to prevent interference between different token types.
arXiv Detail & Related papers (2025-06-09T07:57:24Z)
Can LLMs "Reason" in Music? An Evaluation of LLMs' Capability of Music Understanding and Generation [31.825105824490464]
Symbolic Music, akin to language, can be encoded in discrete symbols. Recent research has extended the application of large language models (LLMs) to the symbolic music domain. This study conducts a thorough investigation of LLMs' capability and limitations in symbolic music processing.
arXiv Detail & Related papers (2024-07-31T11:29:46Z)
ComposerX: Multi-Agent Symbolic Music Composition with LLMs [51.68908082829048]
Music composition is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. Current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and Chain-of-Thoughts. We propose ComposerX, an agent-based symbolic music generation framework.
arXiv Detail & Related papers (2024-04-28T06:17:42Z)
MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music. To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation) Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z)
SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition [82.38021790213752]
SongComposer is a music-specialized large language model (LLM)<n>It integrates the capability of simultaneously composing melodies into LLMs by leveraging three key innovations.<n>It outperforms advanced LLMs in tasks such as lyric-to-melody generation, melody-to-lyric generation, song continuation, and text-to-song creation.<n>We will release SongCompose, a large-scale dataset for training, containing paired lyrics and melodies in Chinese and English.
arXiv Detail & Related papers (2024-02-27T16:15:28Z)
ALCAP: Alignment-Augmented Music Captioner [34.85003676798762]
We introduce a method to learn multimodal alignment between audio and lyrics through contrastive learning. This not only recognizes and emphasizes the synergy between audio and lyrics but also paves the way for models to achieve deeper cross-modal coherence.
arXiv Detail & Related papers (2022-12-21T10:20:54Z)
Contrastive Learning with Positive-Negative Frame Mask for Music Representation [91.44187939465948]
This paper proposes a novel Positive-nEgative frame mask for Music Representation based on the contrastive learning framework, abbreviated as PEMR. We devise a novel contrastive learning objective to accommodate both self-augmented positives/negatives sampled from the same music.
arXiv Detail & Related papers (2022-03-17T07:11:42Z)
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training [97.91071692716406]
Symbolic music understanding refers to the understanding of music from the symbolic data. MusicBERT is a large-scale pre-trained model for music understanding.
arXiv Detail & Related papers (2021-06-10T10:13:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.