Related papers: Structure-Aware Piano Accompaniment via Style Planning and Dataset-Aligned Pattern Retrieval

Structure-Aware Piano Accompaniment via Style Planning and Dataset-Aligned Pattern Retrieval

URL: http://arxiv.org/abs/2602.15074v1
Date: Mon, 16 Feb 2026 03:54:34 GMT
Title: Structure-Aware Piano Accompaniment via Style Planning and Dataset-Aligned Pattern Retrieval
Authors: Wanyu Zang, Yang Yu, Meng Yu,
Abstract summary: We introduce a structure-aware approach for symbolic piano accompaniment.<n>A transformer predicts an interpretable, per-measure style plan conditioned on section/phrase structure and functional harmony.<n>A retriever selects and reharmonizes human-performed piano patterns from a corpus.
Score: 8.505620355469725
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce a structure-aware approach for symbolic piano accompaniment that decouples high-level planning from note-level realization. A lightweight transformer predicts an interpretable, per-measure style plan conditioned on section/phrase structure and functional harmony, and a retriever then selects and reharmonizes human-performed piano patterns from a corpus. We formulate retrieval as pattern matching under an explicit energy with terms for harmonic feasibility, structural-role compatibility, voice-leading continuity, style preferences, and repetition control. Given a structured lead sheet and optional keyword prompts, the system generates piano-accompaniment MIDI. In our experiments, transformer style-planner-guided retrieval produces diverse long-form accompaniments with strong style realization. We further analyze planner ablations and quantify inter-style isolation. Experimental results demonstrate the effectiveness of our inference-time approach for piano accompaniment generation.

Related papers

Etude: Piano Cover Generation with a Three-Stage Approach -- Extract, strucTUralize, and DEcode [0.0]
Piano cover generation aims to automatically transform a pop song into a piano arrangement.<n>Existing models often fail to maintain structural consistency with the original song.<n>Rhythmic information is crucial, as it defines structural similarity.<n>Our model produces covers that preserve proper song structure, enhance fluency and musical dynamics, and support highly controllable generation.
arXiv Detail & Related papers (2025-09-20T04:06:43Z)
Scaling Self-Supervised Representation Learning for Symbolic Piano Performance [52.661197827466886]
We study the capabilities of generative autoregressive transformer models trained on large amounts of symbolic solo-piano transcriptions.<n>We use a comparatively smaller, high-quality subset to finetune models to produce musical continuations, perform symbolic classification tasks, and produce general-purpose contrastive MIDI embeddings.
arXiv Detail & Related papers (2025-06-30T14:00:14Z)
Synthesizing Composite Hierarchical Structure from Symbolic Music Corpora [32.18458296933001]
We propose a unified, hierarchical meta-representation of musical structure called the structural temporal graph (STG)<n>For a single piece, the STG is a data structure that defines a hierarchy of progressively finer structural musical features and the temporal relationships between them.
arXiv Detail & Related papers (2025-02-21T02:32:29Z)
Structured Multi-Track Accompaniment Arrangement via Style Prior Modelling [9.489311894706765]
In this paper, we introduce a novel system that leverages prior modelling over disentangled style factors to address these challenges. Our key design is the use of vector quantization and a unique multi-stream Transformer to model the long-term flow of the orchestration style. We show that our system achieves superior coherence, structure, and overall arrangement quality compared to existing baselines.
arXiv Detail & Related papers (2023-10-25T03:30:37Z)
Compositional Foundation Models for Hierarchical Planning [52.18904315515153]
We propose a foundation model which leverages expert foundation model trained on language, vision and action data individually together to solve long-horizon tasks. We use a large language model to construct symbolic plans that are grounded in the environment through a large video diffusion model. Generated video plans are then grounded to visual-motor control, through an inverse dynamics model that infers actions from generated videos.
arXiv Detail & Related papers (2023-09-15T17:44:05Z)
Melody Infilling with User-Provided Structural Context [37.55332319528369]
This paper proposes a novel Transformer-based model for music score infilling. We show that the proposed model can harness the structural information effectively and generate melodies in the style of pop of higher quality.
arXiv Detail & Related papers (2022-10-06T11:37:04Z)
Compose & Embellish: Well-Structured Piano Performance Generation via A Two-Stage Approach [36.49582705724548]
We devise a two-stage Transformer-based framework that Composes a lead sheet first, and then Embellishes it with accompaniment and expressive touches. Our objective and subjective experiments show that Compose & Embellish shrinks the gap in structureness between a current state of the art and real performances by half, and improves other musical aspects such as richness and coherence as well.
arXiv Detail & Related papers (2022-09-17T01:20:59Z)
Planning with Diffusion for Flexible Behavior Synthesis [125.24438991142573]
We consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem. The core of our technical approach lies in a diffusion probabilistic model that plans by iteratively denoising trajectories.
arXiv Detail & Related papers (2022-05-20T07:02:03Z)
SeCo: Separating Unknown Musical Visual Sounds with Consistency Guidance [88.0355290619761]
This work focuses on the separation of unknown musical instruments. We propose the Separation-with-Consistency (SeCo) framework, which can accomplish the separation on unknown categories. Our framework exhibits strong adaptation ability on the novel musical categories and outperforms the baseline methods by a significant margin.
arXiv Detail & Related papers (2022-03-25T09:42:11Z)
Bending Graphs: Hierarchical Shape Matching using Gated Optimal Transport [80.64516377977183]
Shape matching has been a long-studied problem for the computer graphics and vision community. We investigate a hierarchical learning design, to which we incorporate local patch-level information and global shape-level structures. We propose a novel optimal transport solver by recurrently updating features on non-confident nodes to learn globally consistent correspondences between the shapes.
arXiv Detail & Related papers (2022-02-03T11:41:46Z)
Towards Multi-Scale Style Control for Expressive Speech Synthesis [60.08928435252417]
The proposed method employs a multi-scale reference encoder to extract both the global-scale utterance-level and the local-scale quasi-phoneme-level style features of the target speech. During training time, the multi-scale style model could be jointly trained with the speech synthesis model in an end-to-end fashion.
arXiv Detail & Related papers (2021-04-08T05:50:09Z)
Continuous Melody Generation via Disentangled Short-Term Representations and Structural Conditions [14.786601824794369]
We present a model for composing melodies given a user specified symbolic scenario combined with a previous music context. Our model is capable of generating long melodies by regarding 8-beat note sequences as basic units, and shares consistent rhythm pattern structure with another specific song. Results show that the music generated by our model tends to have salient repetition structures, rich motives, and stable rhythm patterns.
arXiv Detail & Related papers (2020-02-05T06:23:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.