EngravingGNN: A Hybrid Graph Neural Network for End-to-End Piano Score Engraving
- URL: http://arxiv.org/abs/2509.19412v1
- Date: Tue, 23 Sep 2025 14:48:35 GMT
- Title: EngravingGNN: A Hybrid Graph Neural Network for End-to-End Piano Score Engraving
- Authors: Emmanouil Karystinaios, Francesco Foscarin, Gerhard Widmer,
- Abstract summary: We propose a unified graph neural network (GNN) framework that targets the case of piano music and quantized symbolic input.<n>Our method employs a multi-task GNN to jointly predict voice connections, staff assignments, pitch spelling, key signature, stem direction, octave shifts, and signs.<n>A dedicated postprocessing pipeline generates print-ready MusicXML/MEI outputs.
- Score: 11.904295041273201
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper focuses on automatic music engraving, i.e., the creation of a humanly-readable musical score from musical content. This step is fundamental for all applications that include a human player, but it remains a mostly unexplored topic in symbolic music processing. In this work, we formalize the problem as a collection of interdependent subtasks, and propose a unified graph neural network (GNN) framework that targets the case of piano music and quantized symbolic input. Our method employs a multi-task GNN to jointly predict voice connections, staff assignments, pitch spelling, key signature, stem direction, octave shifts, and clef signs. A dedicated postprocessing pipeline generates print-ready MusicXML/MEI outputs. Comprehensive evaluation on two diverse piano corpora (J-Pop and DCML Romantic) demonstrates that our unified model achieves good accuracy across all subtasks, compared to existing systems that only specialize in specific subtasks. These results indicate that a shared GNN encoder with lightweight task-specific decoders in a multi-task setting offers a scalable and effective solution for automatic music engraving.
Related papers
- PianoVAM: A Multimodal Piano Performance Dataset [56.318475235705954]
PianoVAM is a comprehensive piano performance dataset that includes videos, audio, MIDI, hand landmarks, fingering labels, and rich metadata.<n>The dataset was recorded using a Disklavier piano, capturing audio and MIDI from amateur pianists during their daily practice sessions.<n>Hand landmarks and fingering labels were extracted using a pretrained hand pose estimation model and a semi-automated fingering annotation algorithm.
arXiv Detail & Related papers (2025-09-10T17:35:58Z) - GraphMuse: A Library for Symbolic Music Graph Processing [3.997809845676912]
GraphMuse is a graph processing framework and library that facilitates efficient music graph processing.
New neighbor sampling technique specifically targeted toward meaningful behavior in musical scores is central to our contribution.
Our hope is that GraphMuse will lead to a boost in, and standardization of, symbolic music processing based on graph representations.
arXiv Detail & Related papers (2024-07-17T15:54:09Z) - Cluster and Separate: a GNN Approach to Voice and Staff Prediction for Score Engraving [5.572472212662453]
This paper approaches the problem of separating the notes from a quantized symbolic music piece (e.g., a MIDI file) into multiple voices and staves.
We propose an end-to-end system based on graph neural networks that notes that belong to the same chord and connect them with edges if they are part of a voice.
arXiv Detail & Related papers (2024-07-15T14:36:13Z) - N-Gram Unsupervised Compoundation and Feature Injection for Better
Symbolic Music Understanding [27.554853901252084]
Music sequences exhibit strong correlations between adjacent elements, making them prime candidates for N-gram techniques from Natural Language Processing (NLP)
In this paper, we propose a novel method, NG-Midiformer, for understanding symbolic music sequences that leverages the N-gram approach.
arXiv Detail & Related papers (2023-12-13T06:08:37Z) - Video-Mined Task Graphs for Keystep Recognition in Instructional Videos [71.16703750980143]
Procedural activity understanding requires perceiving human actions in terms of a broader task.
We propose discovering a task graph automatically from how-to videos to represent probabilistically how people tend to execute keysteps.
We show the impact: more reliable zero-shot keystep localization and improved video representation learning.
arXiv Detail & Related papers (2023-07-17T18:19:36Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - GETMusic: Generating Any Music Tracks with a Unified Representation and
Diffusion Framework [58.64512825534638]
Symbolic music generation aims to create musical notes, which can help users compose music.
We introduce a framework known as GETMusic, with GET'' standing for GEnerate music Tracks''
GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time.
Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations.
arXiv Detail & Related papers (2023-05-18T09:53:23Z) - RMSSinger: Realistic-Music-Score based Singing Voice Synthesis [56.51475521778443]
RMS-SVS aims to generate high-quality singing voices given realistic music scores with different note types.
We propose RMSSinger, the first RMS-SVS method, which takes realistic music scores as input.
In RMSSinger, we introduce word-level modeling to avoid the time-consuming phoneme duration annotation and the complicated phoneme-level mel-note alignment.
arXiv Detail & Related papers (2023-05-18T03:57:51Z) - Cadence Detection in Symbolic Classical Music using Graph Neural
Networks [7.817685358710508]
We present a graph representation of symbolic scores as an intermediate means to solve the cadence detection task.
We approach cadence detection as an imbalanced node classification problem using a Graph Convolutional Network.
Our experiments suggest that graph convolution can learn non-local features that assist in cadence detection, freeing us from the need of having to devise specialized features that encode non-local context.
arXiv Detail & Related papers (2022-08-31T12:39:57Z) - A Novel Multi-Task Learning Method for Symbolic Music Emotion
Recognition [76.65908232134203]
Symbolic Music Emotion Recognition(SMER) is to predict music emotion from symbolic data, such as MIDI and MusicXML.
In this paper, we present a simple multi-task framework for SMER, which incorporates the emotion recognition task with other emotion-related auxiliary tasks.
arXiv Detail & Related papers (2022-01-15T07:45:10Z) - Differential Music: Automated Music Generation Using LSTM Networks with
Representation Based on Melodic and Harmonic Intervals [0.0]
This paper presents a generative AI model for automated music composition with LSTM networks.
It takes a novel approach at encoding musical information which is based on movement in music rather than absolute pitch.
Experimental results show promise as they sound musical and tonal.
arXiv Detail & Related papers (2021-08-23T23:51:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.