Structure-informed Positional Encoding for Music Generation
- URL: http://arxiv.org/abs/2402.13301v2
- Date: Wed, 28 Feb 2024 12:37:34 GMT
- Title: Structure-informed Positional Encoding for Music Generation
- Authors: Manvi Agarwal (S2A, IDS), Changhong Wang (S2A, IDS), Ga\"el Richard
(S2A, IDS)
- Abstract summary: We propose a structure-informed positional encoding framework for music generation with Transformers.
We test them on two symbolic music generation tasks: next-timestep prediction and accompaniment generation.
Our methods improve the melodic and structural consistency of the generated pieces.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Music generated by deep learning methods often suffers from a lack of
coherence and long-term organization. Yet, multi-scale hierarchical structure
is a distinctive feature of music signals. To leverage this information, we
propose a structure-informed positional encoding framework for music generation
with Transformers. We design three variants in terms of absolute, relative and
non-stationary positional information. We comprehensively test them on two
symbolic music generation tasks: next-timestep prediction and accompaniment
generation. As a comparison, we choose multiple baselines from the literature
and demonstrate the merits of our methods using several musically-motivated
evaluation metrics. In particular, our methods improve the melodic and
structural consistency of the generated pieces.
Related papers
- Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - Unsupervised Melody-to-Lyric Generation [91.29447272400826]
We propose a method for generating high-quality lyrics without training on any aligned melody-lyric data.
We leverage the segmentation and rhythm alignment between melody and lyrics to compile the given melody into decoding constraints.
Our model can generate high-quality lyrics that are more on-topic, singable, intelligible, and coherent than strong baselines.
arXiv Detail & Related papers (2023-05-30T17:20:25Z) - WuYun: Exploring hierarchical skeleton-guided melody generation using
knowledge-enhanced deep learning [26.515527387450636]
WuYun is a knowledge-enhanced deep learning architecture for improving structure of generated melodies.
We use music domain knowledge to extract melodic skeletons and employ sequence learning to reconstruct them.
We demonstrate that WuYun can generate melodies with better long-term structure and musicality and outperforms other state-of-the-art methods by 0.51 on average.
arXiv Detail & Related papers (2023-01-11T14:33:42Z) - Museformer: Transformer with Fine- and Coarse-Grained Attention for
Music Generation [138.74751744348274]
We propose Museformer, a Transformer with a novel fine- and coarse-grained attention for music generation.
Specifically, with the fine-grained attention, a token of a specific bar directly attends to all the tokens of the bars that are most relevant to music structures.
With the coarse-grained attention, a token only attends to the summarization of the other bars rather than each token of them so as to reduce the computational cost.
arXiv Detail & Related papers (2022-10-19T07:31:56Z) - The Power of Reuse: A Multi-Scale Transformer Model for Structural
Dynamic Segmentation in Symbolic Music Generation [6.0949335132843965]
Symbolic Music Generation relies on the contextual representation capabilities of the generative model.
We propose a multi-scale Transformer, which uses coarse-decoder and fine-decoders to model the contexts at the global and section-level.
Our model is evaluated on two open MIDI datasets, and experiments show that our model outperforms the best contemporary symbolic music generative models.
arXiv Detail & Related papers (2022-05-17T18:48:14Z) - Symphony Generation with Permutation Invariant Language Model [57.75739773758614]
We present a symbolic symphony music generation solution, SymphonyNet, based on a permutation invariant language model.
A novel transformer decoder architecture is introduced as backbone for modeling extra-long sequences of symphony tokens.
Our empirical results show that our proposed approach can generate coherent, novel, complex and harmonious symphony compared to human composition.
arXiv Detail & Related papers (2022-05-10T13:08:49Z) - Controllable deep melody generation via hierarchical music structure
representation [14.891975420982511]
MusicFrameworks is a hierarchical music structure representation and a multi-step generative process to create a full-length melody.
To generate melody in each phrase, we generate rhythm and basic melody using two separate transformer-based networks.
To customize or add variety, one can alter chords, basic melody, and rhythm structure in the music frameworks, letting our networks generate the melody accordingly.
arXiv Detail & Related papers (2021-09-02T01:31:14Z) - Generating Lead Sheets with Affect: A Novel Conditional seq2seq
Framework [3.029434408969759]
We present a novel approach for calculating the positivity or negativity of a chord progression within a lead sheet.
Our approach is similar to a Neural Machine Translation (NMT) problem, as we include high-level conditions in the encoder part of the sequence-to-sequence architectures.
The proposed strategy is able to generate lead sheets in a controllable manner, resulting in distributions of musical attributes similar to those of the training dataset.
arXiv Detail & Related papers (2021-04-27T09:04:21Z) - Structure-Aware Audio-to-Score Alignment using Progressively Dilated
Convolutional Neural Networks [8.669338893753885]
The identification of structural differences between a music performance and the score is a challenging yet integral step of audio-to-score alignment.
We present a novel method to detect such differences using progressively dilated convolutional neural networks.
arXiv Detail & Related papers (2021-01-31T05:14:58Z) - Sequence Generation using Deep Recurrent Networks and Embeddings: A
study case in music [69.2737664640826]
This paper evaluates different types of memory mechanisms (memory cells) and analyses their performance in the field of music composition.
A set of quantitative metrics is presented to evaluate the performance of the proposed architecture automatically.
arXiv Detail & Related papers (2020-12-02T14:19:19Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.