Barwise Section Boundary Detection in Symbolic Music Using Convolutional Neural Networks
- URL: http://arxiv.org/abs/2509.16566v1
- Date: Sat, 20 Sep 2025 07:52:08 GMT
- Title: Barwise Section Boundary Detection in Symbolic Music Using Convolutional Neural Networks
- Authors: Omar Eldeeb, Martin Malandro,
- Abstract summary: We introduce a human-annotated MIDI dataset for section boundary detection.<n>Second, we train a deep learning model to classify the presence of section boundaries within a fixed-length musical window.<n>Our model achieves an F1 score of 0.77, improving over the analogous audio-based supervised learning approach.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current methods for Music Structure Analysis (MSA) focus primarily on audio data. While symbolic music can be synthesized into audio and analyzed using existing MSA techniques, such an approach does not exploit symbolic music's rich explicit representation of pitch, timing, and instrumentation. A key subproblem of MSA is section boundary detection-determining whether a given point in time marks the transition between musical sections. In this paper, we study automatic section boundary detection for symbolic music. First, we introduce a human-annotated MIDI dataset for section boundary detection, consisting of metadata from 6134 MIDI files that we manually curated from the Lakh MIDI dataset. Second, we train a deep learning model to classify the presence of section boundaries within a fixed-length musical window. Our data representation involves a novel encoding scheme based on synthesized overtones to encode arbitrary MIDI instrumentations into 3-channel piano rolls. Our model achieves an F1 score of 0.77, improving over the analogous audio-based supervised learning approach and the unsupervised block-matching segmentation (CBM) audio approach by 0.22 and 0.31, respectively. We release our dataset, code, and models.
Related papers
- PianoVAM: A Multimodal Piano Performance Dataset [56.318475235705954]
PianoVAM is a comprehensive piano performance dataset that includes videos, audio, MIDI, hand landmarks, fingering labels, and rich metadata.<n>The dataset was recorded using a Disklavier piano, capturing audio and MIDI from amateur pianists during their daily practice sessions.<n>Hand landmarks and fingering labels were extracted using a pretrained hand pose estimation model and a semi-automated fingering annotation algorithm.
arXiv Detail & Related papers (2025-09-10T17:35:58Z) - RUMAA: Repeat-Aware Unified Music Audio Analysis for Score-Performance Alignment, Transcription, and Mistake Detection [17.45655063331199]
RUMAA is a transformer-based framework for music performance analysis.<n>It unifies score-to-performance alignment, score-informed transcription, and mistake detection in a near end-to-end manner.
arXiv Detail & Related papers (2025-07-16T12:13:13Z) - Aria-MIDI: A Dataset of Piano MIDI Files for Symbolic Music Modeling [1.3812010983144802]
We introduce an extensive new dataset of MIDI files, created by transcribing audio recordings of piano performances into their constituent notes.<n>The data pipeline we use is multi-stage, employing a language model to autonomously crawl and score audio recordings from the internet.<n>The resulting dataset contains over one million distinct MIDI files, comprising roughly 100,000 hours of transcribed audio.
arXiv Detail & Related papers (2025-04-21T12:59:40Z) - Toward a More Complete OMR Solution [49.74172035862698]
Optical music recognition aims to convert music notation into digital formats.
One approach to tackle OMR is through a multi-stage pipeline, where the system first detects visual music notation elements in the image.
We introduce a music object detector based on YOLOv8, which improves detection performance.
Second, we introduce a supervised training pipeline that completes the notation assembly stage based on detection output.
arXiv Detail & Related papers (2024-08-31T01:09:12Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long
Multi-track Symbolic Music Generation [50.365392018302416]
We propose Multi-view MidiVAE, as one of the pioneers in VAE methods that effectively model and generate long multi-track symbolic music.
We focus on instrumental characteristics and harmony as well as global and local information about the musical composition by employing a hybrid variational encoding-decoding strategy.
arXiv Detail & Related papers (2024-01-15T08:41:01Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - Symbolic Music Structure Analysis with Graph Representations and
Changepoint Detection Methods [1.1677169430445211]
We propose three methods to segment symbolic music by its form or structure: Norm, G-PELT and G-Window.
We have found that encoding symbolic music with graph representations and computing the novelty of Adjacency Matrices represent the structure of symbolic music pieces well.
arXiv Detail & Related papers (2023-03-24T09:45:11Z) - Multi-instrument Music Synthesis with Spectrogram Diffusion [19.81982315173444]
We focus on a middle ground of neural synthesizers that can generate audio from MIDI sequences with arbitrary combinations of instruments in realtime.
We use a simple two-stage process: MIDI to spectrograms with an encoder-decoder Transformer, then spectrograms to audio with a generative adversarial network (GAN) spectrogram inverter.
We find this to be a promising first step towards interactive and expressive neural synthesis for arbitrary combinations of instruments and notes.
arXiv Detail & Related papers (2022-06-11T03:26:15Z) - Symphony Generation with Permutation Invariant Language Model [57.75739773758614]
We present a symbolic symphony music generation solution, SymphonyNet, based on a permutation invariant language model.
A novel transformer decoder architecture is introduced as backbone for modeling extra-long sequences of symphony tokens.
Our empirical results show that our proposed approach can generate coherent, novel, complex and harmonious symphony compared to human composition.
arXiv Detail & Related papers (2022-05-10T13:08:49Z) - Differential Music: Automated Music Generation Using LSTM Networks with
Representation Based on Melodic and Harmonic Intervals [0.0]
This paper presents a generative AI model for automated music composition with LSTM networks.
It takes a novel approach at encoding musical information which is based on movement in music rather than absolute pitch.
Experimental results show promise as they sound musical and tonal.
arXiv Detail & Related papers (2021-08-23T23:51:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.