Related papers: Large Language Models' Internal Perception of Symbolic Music

Large Language Models' Internal Perception of Symbolic Music

URL: http://arxiv.org/abs/2507.12808v1
Date: Thu, 17 Jul 2025 05:48:45 GMT
Title: Large Language Models' Internal Perception of Symbolic Music
Authors: Andrew Shin, Kunitake Kaneko,
Abstract summary: Large language models (LLMs) excel at modeling relationships between strings in natural language.<n>This paper investigates how LLMs represent musical concepts by generating symbolic music data from textual prompts.
Score: 3.9901365062418317
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) excel at modeling relationships between strings in natural language and have shown promise in extending to other symbolic domains like coding or mathematics. However, the extent to which they implicitly model symbolic music remains underexplored. This paper investigates how LLMs represent musical concepts by generating symbolic music data from textual prompts describing combinations of genres and styles, and evaluating their utility through recognition and generation tasks. We produce a dataset of LLM-generated MIDI files without relying on explicit musical training. We then train neural networks entirely on this LLM-generated MIDI dataset and perform genre and style classification as well as melody completion, benchmarking their performance against established models. Our results demonstrate that LLMs can infer rudimentary musical structures and temporal relationships from text, highlighting both their potential to implicitly encode musical patterns and their limitations due to a lack of explicit musical context, shedding light on their generative capabilities for symbolic music.

Related papers

MusiXQA: Advancing Visual Music Understanding in Multimodal Large Language Models [46.761820987130065]
MusiXQA is the first comprehensive dataset for evaluating and advancing MLLMs in music sheet understanding.<n>We develop Phi-3-MusiX, an MLLM fine-tuned on our dataset, achieving significant performance gains over GPT-based methods.
arXiv Detail & Related papers (2025-06-28T20:46:47Z)
Can LLMs "Reason" in Music? An Evaluation of LLMs' Capability of Music Understanding and Generation [31.825105824490464]
Symbolic Music, akin to language, can be encoded in discrete symbols. Recent research has extended the application of large language models (LLMs) to the symbolic music domain. This study conducts a thorough investigation of LLMs' capability and limitations in symbolic music processing.
arXiv Detail & Related papers (2024-07-31T11:29:46Z)
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations. We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music. Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z)
ComposerX: Multi-Agent Symbolic Music Composition with LLMs [51.68908082829048]
Music composition is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. Current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and Chain-of-Thoughts. We propose ComposerX, an agent-based symbolic music generation framework.
arXiv Detail & Related papers (2024-04-28T06:17:42Z)
MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music. To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation) Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z)
SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition [82.38021790213752]
SongComposer is a music-specialized large language model (LLM)<n>It integrates the capability of simultaneously composing melodies into LLMs by leveraging three key innovations.<n>It outperforms advanced LLMs in tasks such as lyric-to-melody generation, melody-to-lyric generation, song continuation, and text-to-song creation.<n>We will release SongCompose, a large-scale dataset for training, containing paired lyrics and melodies in Chinese and English.
arXiv Detail & Related papers (2024-02-27T16:15:28Z)
Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval: a Survey [6.416887247454113]
This survey reviews NLP methods applied to symbolic music generation and information retrieval studies. We first propose an overview of representations of symbolic music adapted from natural language sequential representations. We describe these models, in particular deep learning models, through different prisms, highlighting music-specialized mechanisms.
arXiv Detail & Related papers (2024-02-27T12:48:01Z)
ChatMusician: Understanding and Generating Music Intrinsically with LLM [81.48629006702409]
ChatMusician is an open-source Large Language Models (LLMs) that integrates intrinsic musical abilities. It can understand and generate music with a pure text tokenizer without any external multi-modal neural structures or tokenizers. Our model is capable of composing well-structured, full-length music, conditioned on texts, chords, melodies, motifs, musical forms, etc.
arXiv Detail & Related papers (2024-02-25T17:19:41Z)
Embeddings as representation for symbolic music [0.0]
A representation technique that allows encoding music in a way that contains musical meaning would improve the results of any model trained for computer music tasks. In this paper, we experiment with embeddings to represent musical notes from 3 different variations of a dataset and analyze if the model can capture useful musical patterns.
arXiv Detail & Related papers (2020-05-19T13:04:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.