Melody Infilling with User-Provided Structural Context
- URL: http://arxiv.org/abs/2210.02829v1
- Date: Thu, 6 Oct 2022 11:37:04 GMT
- Title: Melody Infilling with User-Provided Structural Context
- Authors: Chih-Pin Tan, Alvin W.Y. Su and Yi-Hsuan Yang
- Abstract summary: This paper proposes a novel Transformer-based model for music score infilling.
We show that the proposed model can harness the structural information effectively and generate melodies in the style of pop of higher quality.
- Score: 37.55332319528369
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes a novel Transformer-based model for music score
infilling, to generate a music passage that fills in the gap between given past
and future contexts. While existing infilling approaches can generate a passage
that connects smoothly locally with the given contexts, they do not take into
account the musical form or structure of the music and may therefore generate
overly smooth results. To address this issue, we propose a structure-aware
conditioning approach that employs a novel attention-selecting module to supply
user-provided structure-related information to the Transformer for infilling.
With both objective and subjective evaluations, we show that the proposed model
can harness the structural information effectively and generate melodies in the
style of pop of higher quality than the two existing structure-agnostic
infilling models.
Related papers
- Integrating Text-to-Music Models with Language Models: Composing Long Structured Music Pieces [0.0]
This paper proposes integrating a text-to-music model with a large language model to generate music with form.
The experimental results show that the proposed method can generate 2.5-minute-long music that is highly structured, strongly organized, and cohesive.
arXiv Detail & Related papers (2024-10-01T02:43:14Z) - StemGen: A music generation model that listens [9.489938613869864]
We present an alternative paradigm for producing music generation models that can listen and respond to musical context.
We describe how such a model can be constructed using a non-autoregressive, transformer-based model architecture.
The resulting model reaches the audio quality of state-of-the-art text-conditioned models, as well as exhibiting strong musical coherence with its context.
arXiv Detail & Related papers (2023-12-14T08:09:20Z) - ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models [67.66825818489406]
This paper introduces a text-to-waveform music generation model, underpinned by the utilization of diffusion models.
Our methodology hinges on the innovative incorporation of free-form textual prompts as conditional factors to guide the waveform generation process.
We demonstrate that our generated music in the waveform domain outperforms previous works by a large margin in terms of diversity, quality, and text-music relevance.
arXiv Detail & Related papers (2023-02-09T06:27:09Z) - SIM-Trans: Structure Information Modeling Transformer for Fine-grained
Visual Categorization [59.732036564862796]
We propose the Structure Information Modeling Transformer (SIM-Trans) to incorporate object structure information into transformer for enhancing discriminative representation learning.
The proposed two modules are light-weighted and can be plugged into any transformer network and trained end-to-end easily.
Experiments and analyses demonstrate that the proposed SIM-Trans achieves state-of-the-art performance on fine-grained visual categorization benchmarks.
arXiv Detail & Related papers (2022-08-31T03:00:07Z) - Structural Biases for Improving Transformers on Translation into
Morphologically Rich Languages [120.74406230847904]
TP-Transformer augments the traditional Transformer architecture to include an additional component to represent structure.
The second method imbues structure at the data level by segmenting the data with morphological tokenization.
We find that each of these two approaches allows the network to achieve better performance, but this improvement is dependent on the size of the dataset.
arXiv Detail & Related papers (2022-08-11T22:42:24Z) - Generating Coherent Narratives by Learning Dynamic and Discrete Entity
States with a Contrastive Framework [68.1678127433077]
We extend the Transformer model to dynamically conduct entity state updates and sentence realization for narrative generation.
Experiments on two narrative datasets show that our model can generate more coherent and diverse narratives than strong baselines.
arXiv Detail & Related papers (2022-08-08T09:02:19Z) - A framework to compare music generative models using automatic
evaluation metrics extended to rhythm [69.2737664640826]
This paper takes the framework proposed in a previous research that did not consider rhythm to make a series of design decisions, then, rhythm support is added to evaluate the performance of two RNN memory cells in the creation of monophonic music.
The model considers the handling of music transposition and the framework evaluates the quality of the generated pieces using automatic quantitative metrics based on geometry which have rhythm support added as well.
arXiv Detail & Related papers (2021-01-19T15:04:46Z) - Music Generation with Temporal Structure Augmentation [0.0]
The proposed method augments a connectionist generation model with count-down to song conclusion and meter markers as extra input features.
An RNN architecture with LSTM cells is trained on the Nottingham folk music dataset in a supervised sequence learning setup.
Experiments show an improved prediction performance for both types of annotation.
arXiv Detail & Related papers (2020-04-21T19:19:58Z) - Continuous Melody Generation via Disentangled Short-Term Representations
and Structural Conditions [14.786601824794369]
We present a model for composing melodies given a user specified symbolic scenario combined with a previous music context.
Our model is capable of generating long melodies by regarding 8-beat note sequences as basic units, and shares consistent rhythm pattern structure with another specific song.
Results show that the music generated by our model tends to have salient repetition structures, rich motives, and stable rhythm patterns.
arXiv Detail & Related papers (2020-02-05T06:23:44Z) - Pop Music Transformer: Beat-based Modeling and Generation of Expressive
Pop Piano Compositions [37.66340344198797]
We build a Pop Music Transformer that composes Pop piano music with better rhythmic structure than existing Transformer models.
In particular, we seek to impose a metrical structure in the input data, so that Transformers can be more easily aware of the beat-bar-phrase hierarchical structure in music.
arXiv Detail & Related papers (2020-02-01T14:12:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.