Compose & Embellish: Well-Structured Piano Performance Generation via A
Two-Stage Approach
- URL: http://arxiv.org/abs/2209.08212v1
- Date: Sat, 17 Sep 2022 01:20:59 GMT
- Title: Compose & Embellish: Well-Structured Piano Performance Generation via A
Two-Stage Approach
- Authors: Shih-Lun Wu, Yi-Hsuan Yang
- Abstract summary: We devise a two-stage Transformer-based framework that Composes a lead sheet first, and then Embellishes it with accompaniment and expressive touches.
Our objective and subjective experiments show that Compose & Embellish shrinks the gap in structureness between a current state of the art and real performances by half, and improves other musical aspects such as richness and coherence as well.
- Score: 36.49582705724548
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Even with strong sequence models like Transformers, generating expressive
piano performances with long-range musical structures remains challenging.
Meanwhile, methods to compose well-structured melodies or lead sheets (melody +
chords), i.e., simpler forms of music, gained more success. Observing the
above, we devise a two-stage Transformer-based framework that Composes a lead
sheet first, and then Embellishes it with accompaniment and expressive touches.
Such a factorization also enables pretraining on non-piano data. Our objective
and subjective experiments show that Compose & Embellish shrinks the gap in
structureness between a current state of the art and real performances by half,
and improves other musical aspects such as richness and coherence as well.
Related papers
- Exploring Classical Piano Performance Generation with Expressive Music Variational AutoEncoder [15.668253435545921]
This paper addresses the challenge of generating classical piano performances from scratch, aiming to emulate the dual roles of composer and pianist.<n>We introduce the Expressive Compound Word representation, which effectively captures both the metrical structure and expressive nuances of classical performances.<n>We propose the Expressive Music Variational AutoEncoder (XMVAE), a model featuring two branches: a Vector Quantized Variational AutoEncoder (VQ-VAE) branch that generates score-related content, and a vanilla VAE branch that produces expressive details, fulfilling the role of Pianist.
arXiv Detail & Related papers (2025-07-02T10:54:23Z) - Scaling Self-Supervised Representation Learning for Symbolic Piano Performance [52.661197827466886]
We study the capabilities of generative autoregressive transformer models trained on large amounts of symbolic solo-piano transcriptions.<n>We use a comparatively smaller, high-quality subset to finetune models to produce musical continuations, perform symbolic classification tasks, and produce general-purpose contrastive MIDI embeddings.
arXiv Detail & Related papers (2025-06-30T14:00:14Z) - From Generality to Mastery: Composer-Style Symbolic Music Generation via Large-Scale Pre-training [4.7205815347741185]
We investigate how general music knowledge learned from a broad corpus can enhance the mastery of specific composer styles.<n>First, we pre-train a REMI-based music generation model on a large corpus of pop, folk, and classical music.<n>Then, we fine-tune it on a small, human-verified dataset from four renowned composers, namely Bach, Mozart, Beethoven, and Chopin.
arXiv Detail & Related papers (2025-06-20T22:20:59Z) - YuE: Scaling Open Foundation Models for Long-Form Music Generation [134.54174498094565]
YuE is a family of open foundation models based on the LLaMA2 architecture.
It generates up to five minutes of music while maintaining lyrical alignment, coherent musical structure, and engaging vocal melodies with appropriate accompaniment.
arXiv Detail & Related papers (2025-03-11T17:26:50Z) - ImprovNet: Generating Controllable Musical Improvisations with Iterative Corruption Refinement [6.873190001575463]
ImprovNet is a transformer-based architecture that generates expressive and controllable musical improvisations.
It can perform cross-genre and intra-genre improvisations, harmonize melodies with genre-specific styles, and execute short prompt continuation and infilling tasks.
arXiv Detail & Related papers (2025-02-06T21:45:38Z) - PerTok: Expressive Encoding and Modeling of Symbolic Musical Ideas and Variations [0.3683202928838613]
Cadenza is a new multi-stage generative framework for predicting expressive variations of symbolic musical ideas.
The proposed framework comprises of two sequential stages: 1) Composer and 2) Performer.
Our framework is designed, researched and implemented with the objective of providing inspiration for musicians.
arXiv Detail & Related papers (2024-10-02T22:11:31Z) - MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss [51.85076222868963]
We introduce a pre-training task designed to link control signals directly with corresponding musical tokens.
We then implement a novel counterfactual loss that promotes better alignment between the generated music and the control prompts.
arXiv Detail & Related papers (2024-07-05T08:08:22Z) - SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition [82.38021790213752]
SongComposer is a music-specialized large language model (LLM)<n>It integrates the capability of simultaneously composing melodies into LLMs by leveraging three key innovations.<n>It outperforms advanced LLMs in tasks such as lyric-to-melody generation, melody-to-lyric generation, song continuation, and text-to-song creation.<n>We will release SongCompose, a large-scale dataset for training, containing paired lyrics and melodies in Chinese and English.
arXiv Detail & Related papers (2024-02-27T16:15:28Z) - Museformer: Transformer with Fine- and Coarse-Grained Attention for
Music Generation [138.74751744348274]
We propose Museformer, a Transformer with a novel fine- and coarse-grained attention for music generation.
Specifically, with the fine-grained attention, a token of a specific bar directly attends to all the tokens of the bars that are most relevant to music structures.
With the coarse-grained attention, a token only attends to the summarization of the other bars rather than each token of them so as to reduce the computational cost.
arXiv Detail & Related papers (2022-10-19T07:31:56Z) - SeCo: Separating Unknown Musical Visual Sounds with Consistency Guidance [88.0355290619761]
This work focuses on the separation of unknown musical instruments.
We propose the Separation-with-Consistency (SeCo) framework, which can accomplish the separation on unknown categories.
Our framework exhibits strong adaptation ability on the novel musical categories and outperforms the baseline methods by a significant margin.
arXiv Detail & Related papers (2022-03-25T09:42:11Z) - A-Muze-Net: Music Generation by Composing the Harmony based on the
Generated Melody [91.22679787578438]
We present a method for the generation of Midi files of piano music.
The method models the right and left hands using two networks, where the left hand is conditioned on the right hand.
The Midi is represented in a way that is invariant to the musical scale, and the melody is represented, for the purpose of conditioning the harmony.
arXiv Detail & Related papers (2021-11-25T09:45:53Z) - Structure-Enhanced Pop Music Generation via Harmony-Aware Learning [20.06867705303102]
We propose to leverage harmony-aware learning for structure-enhanced pop music generation.
Results of subjective and objective evaluations demonstrate that Harmony-Aware Hierarchical Music Transformer (HAT) significantly improves the quality of generated music.
arXiv Detail & Related papers (2021-09-14T05:04:13Z) - Controllable deep melody generation via hierarchical music structure
representation [14.891975420982511]
MusicFrameworks is a hierarchical music structure representation and a multi-step generative process to create a full-length melody.
To generate melody in each phrase, we generate rhythm and basic melody using two separate transformer-based networks.
To customize or add variety, one can alter chords, basic melody, and rhythm structure in the music frameworks, letting our networks generate the melody accordingly.
arXiv Detail & Related papers (2021-09-02T01:31:14Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z) - Continuous Melody Generation via Disentangled Short-Term Representations
and Structural Conditions [14.786601824794369]
We present a model for composing melodies given a user specified symbolic scenario combined with a previous music context.
Our model is capable of generating long melodies by regarding 8-beat note sequences as basic units, and shares consistent rhythm pattern structure with another specific song.
Results show that the music generated by our model tends to have salient repetition structures, rich motives, and stable rhythm patterns.
arXiv Detail & Related papers (2020-02-05T06:23:44Z) - Pop Music Transformer: Beat-based Modeling and Generation of Expressive
Pop Piano Compositions [37.66340344198797]
We build a Pop Music Transformer that composes Pop piano music with better rhythmic structure than existing Transformer models.
In particular, we seek to impose a metrical structure in the input data, so that Transformers can be more easily aware of the beat-bar-phrase hierarchical structure in music.
arXiv Detail & Related papers (2020-02-01T14:12:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.