Related papers: Discovering "Words" in Music: Unsupervised Learning of Compositional Sparse Code for Symbolic Music

Discovering "Words" in Music: Unsupervised Learning of Compositional Sparse Code for Symbolic Music

URL: http://arxiv.org/abs/2509.24603v1
Date: Mon, 29 Sep 2025 11:10:57 GMT
Title: Discovering "Words" in Music: Unsupervised Learning of Compositional Sparse Code for Symbolic Music
Authors: Tianle Wang, Sirui Zhang, Xinyi Tong, Peiyang Yu, Jishang Chen, Liangke Zhao, Xinpu Gao, Yves Zhu, Tiezheng Ge, Bo Zheng, Duo Xu, Yang Liu, Xin Jin, Feng Yu, Songchun Zhu,
Abstract summary: This paper presents an unsupervised machine learning algorithm that identifies recurring patterns -- referred to as music-words'' -- from symbolic music data.<n>We formulate the task of music-word discovery as a statistical optimization problem and propose a two-stage Expectation-Maximization (EM)-based learning framework.
Score: 50.87225308217594
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents an unsupervised machine learning algorithm that identifies recurring patterns -- referred to as ``music-words'' -- from symbolic music data. These patterns are fundamental to musical structure and reflect the cognitive processes involved in composition. However, extracting these patterns remains challenging because of the inherent semantic ambiguity in musical interpretation. We formulate the task of music-word discovery as a statistical optimization problem and propose a two-stage Expectation-Maximization (EM)-based learning framework: 1. Developing a music-word dictionary; 2. Reconstructing the music data. When evaluated against human expert annotations, the algorithm achieved an Intersection over Union (IoU) score of 0.61. Our findings indicate that minimizing code length effectively addresses semantic ambiguity, suggesting that human optimization of encoding systems shapes musical semantics. This approach enables computers to extract ``basic building blocks'' from music data, facilitating structural analysis and sparse encoding. The method has two primary applications. First, in AI music, it supports downstream tasks such as music generation, classification, style transfer, and improvisation. Second, in musicology, it provides a tool for analyzing compositional patterns and offers insights into the principle of minimal encoding across diverse musical styles and composers.

Related papers

MuseTok: Symbolic Music Tokenization for Generation and Semantic Understanding [46.89003337712407]
We propose MuseTok, a tokenization method for symbolic music.<n>MuseTok employs the residual vector quantized-variational autoencoder (RQ-VAE) on bar-wise music segments within a Transformer-based encoder-decoder framework.<n>For comprehensive evaluation, we apply MuseTok to music generation and semantic understanding tasks, including melody extraction, chord recognition, and emotion recognition.
arXiv Detail & Related papers (2025-10-18T00:04:48Z)
Towards an AI Musician: Synthesizing Sheet Music Problems for Musical Reasoning [69.78158549955384]
We introduce a novel approach that treats core music theory rules, such as those governing beats and intervals, as programmatic functions.<n>This approach generates verifiable sheet music questions in both textual and visual modalities.<n> Evaluation results on SSMR-Bench highlight the key role reasoning plays in interpreting sheet music.
arXiv Detail & Related papers (2025-09-04T09:42:17Z)
Bridging the Gap Between Semantic and User Preference Spaces for Multi-modal Music Representation Learning [10.558648773612191]
We propose a novel Hierarchical Two-stage Contrastive Learning (HTCL) method that models similarity from the semantic perspective to the user perspective hierarchically.<n>We devise a scalable audio encoder and leverage a pre-trained BERT model as the text encoder to learn audio-text semantics via large-scale contrastive pre-training.
arXiv Detail & Related papers (2025-05-29T09:50:07Z)
Towards Explainable and Interpretable Musical Difficulty Estimation: A Parameter-efficient Approach [49.2787113554916]
Estimating music piece difficulty is important for organizing educational music collections. Our work employs explainable descriptors for difficulty estimation in symbolic music representations. Our approach, evaluated in piano repertoire categorized in 9 classes, achieved 41.4% accuracy independently, with a mean squared error (MSE) of 1.7.
arXiv Detail & Related papers (2024-08-01T11:23:42Z)
From Words to Music: A Study of Subword Tokenization Techniques in Symbolic Music Generation [1.9188864062289432]
Subword tokenization has been widely successful in text-based natural language processing tasks with Transformer-based models. We apply subword tokenization on post-musical tokenization schemes and find that it enables the generation of longer songs at the same time. Our study suggests that subword tokenization is a promising technique for symbolic music generation and may have broader implications for music composition.
arXiv Detail & Related papers (2023-04-18T12:46:12Z)
Pitchclass2vec: Symbolic Music Structure Segmentation with Chord Embeddings [0.8701566919381222]
We present a novel music segmentation method, pitchclass2vec, based on symbolic chord annotations. Our algorithm is based on long-short term memory (LSTM) neural network and outperforms the state-of-the-art techniques based on symbolic chord annotations in the field.
arXiv Detail & Related papers (2023-03-24T10:23:15Z)
In-depth analysis of music structure as a text network [7.735597173716555]
We focus on the fundamental elements of music and construct an evolutionary network from the perspective of music as a natural language. We aim to comprehend the structural differences in music across different periods, enabling a more scientific exploration of music.
arXiv Detail & Related papers (2023-03-21T08:39:56Z)
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training [97.91071692716406]
Symbolic music understanding refers to the understanding of music from the symbolic data. MusicBERT is a large-scale pre-trained model for music understanding.
arXiv Detail & Related papers (2021-06-10T10:13:05Z)
Music Harmony Generation, through Deep Learning and Using a Multi-Objective Evolutionary Algorithm [0.0]
This paper introduces a genetic multi-objective evolutionary optimization algorithm for the generation of polyphonic music. One of the goals is the rules and regulations of music, which, along with the other two goals, including the scores of music experts and ordinary listeners, fits the cycle of evolution to get the most optimal response. The results show that the proposed method is able to generate difficult and pleasant pieces with desired styles and lengths, along with harmonic sounds that follow the grammar while attracting the listener, at the same time.
arXiv Detail & Related papers (2021-02-16T05:05:54Z)
dMelodies: A Music Dataset for Disentanglement Learning [70.90415511736089]
We present a new symbolic music dataset that will help researchers demonstrate the efficacy of their algorithms on diverse domains. This will also provide a means for evaluating algorithms specifically designed for music. The dataset is large enough (approx. 1.3 million data points) to train and test deep networks for disentanglement learning.
arXiv Detail & Related papers (2020-07-29T19:20:07Z)
Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music. We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.