Learning Interpretable Representation for Controllable Polyphonic Music
Generation
- URL: http://arxiv.org/abs/2008.07122v1
- Date: Mon, 17 Aug 2020 07:11:16 GMT
- Title: Learning Interpretable Representation for Controllable Polyphonic Music
Generation
- Authors: Ziyu Wang, Dingsu Wang, Yixiao Zhang, Gus Xia
- Abstract summary: We design a novel architecture, that effectively learns two interpretable latent factors of polyphonic music: chord and texture.
We show that such chord-texture disentanglement provides a controllable generation pathway leading to a wide spectrum of applications.
- Score: 5.01266258109807
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While deep generative models have become the leading methods for algorithmic
composition, it remains a challenging problem to control the generation process
because the latent variables of most deep-learning models lack good
interpretability. Inspired by the content-style disentanglement idea, we design
a novel architecture, under the VAE framework, that effectively learns two
interpretable latent factors of polyphonic music: chord and texture. The
current model focuses on learning 8-beat long piano composition segments. We
show that such chord-texture disentanglement provides a controllable generation
pathway leading to a wide spectrum of applications, including compositional
style transfer, texture variation, and accompaniment arrangement. Both
objective and subjective evaluations show that our method achieves a successful
disentanglement and high quality controlled music generation.
Related papers
- An End-to-End Approach for Chord-Conditioned Song Generation [14.951089833579063]
Song Generation task aims to synthesize music composed of vocals and accompaniment from given lyrics.
To mitigate the issue, we introduce an important concept from music composition, namely chords to song generation networks.
We propose a novel model termed Chord-Conditioned Song Generator (CSG) based on it.
arXiv Detail & Related papers (2024-09-10T08:07:43Z) - MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss [51.85076222868963]
We introduce a pre-training task designed to link control signals directly with corresponding musical tokens.
We then implement a novel counterfactual loss that promotes better alignment between the generated music and the control prompts.
arXiv Detail & Related papers (2024-07-05T08:08:22Z) - Motifs, Phrases, and Beyond: The Modelling of Structure in Symbolic
Music Generation [2.8062498505437055]
Modelling musical structure is vital yet challenging for artificial intelligence systems that generate symbolic music compositions.
This literature review dissects the evolution of techniques for incorporating coherent structure.
We outline several key future directions to realize the synergistic benefits of combining approaches from all eras examined.
arXiv Detail & Related papers (2024-03-12T18:03:08Z) - Structured Multi-Track Accompaniment Arrangement via Style Prior Modelling [9.489311894706765]
In this paper, we introduce a novel system that leverages prior modelling over disentangled style factors to address these challenges.
Our key design is the use of vector quantization and a unique multi-stream Transformer to model the long-term flow of the orchestration style.
We show that our system achieves superior coherence, structure, and overall arrangement quality compared to existing baselines.
arXiv Detail & Related papers (2023-10-25T03:30:37Z) - CoCoFormer: A controllable feature-rich polyphonic music generation
method [2.501600004190393]
This paper proposes Condition Choir Transformer (CoCoFormer) which controls the output of the model by controlling the chord and rhythm inputs at a fine-grained level.
In this paper, the experiments proves that CoCoFormer has reached the current better level than current models.
arXiv Detail & Related papers (2023-10-15T14:04:48Z) - MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training [74.32603591331718]
We propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training.
Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attain state-of-the-art (SOTA) overall scores.
arXiv Detail & Related papers (2023-05-31T18:27:43Z) - Composer: Creative and Controllable Image Synthesis with Composable
Conditions [57.78533372393828]
Recent large-scale generative models learned on big data are capable of synthesizing incredible images yet suffer from limited controllability.
This work offers a new generation paradigm that allows flexible control of the output image, such as spatial layout and palette, while maintaining the synthesis quality and model creativity.
arXiv Detail & Related papers (2023-02-20T05:48:41Z) - Is Disentanglement enough? On Latent Representations for Controllable
Music Generation [78.8942067357231]
In the absence of a strong generative decoder, disentanglement does not necessarily imply controllability.
The structure of the latent space with respect to the VAE-decoder plays an important role in boosting the ability of a generative model to manipulate different attributes.
arXiv Detail & Related papers (2021-08-01T18:37:43Z) - Generating Lead Sheets with Affect: A Novel Conditional seq2seq
Framework [3.029434408969759]
We present a novel approach for calculating the positivity or negativity of a chord progression within a lead sheet.
Our approach is similar to a Neural Machine Translation (NMT) problem, as we include high-level conditions in the encoder part of the sequence-to-sequence architectures.
The proposed strategy is able to generate lead sheets in a controllable manner, resulting in distributions of musical attributes similar to those of the training dataset.
arXiv Detail & Related papers (2021-04-27T09:04:21Z) - A framework to compare music generative models using automatic
evaluation metrics extended to rhythm [69.2737664640826]
This paper takes the framework proposed in a previous research that did not consider rhythm to make a series of design decisions, then, rhythm support is added to evaluate the performance of two RNN memory cells in the creation of monophonic music.
The model considers the handling of music transposition and the framework evaluates the quality of the generated pieces using automatic quantitative metrics based on geometry which have rhythm support added as well.
arXiv Detail & Related papers (2021-01-19T15:04:46Z) - RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement
Learning [69.20460466735852]
This paper presents a deep reinforcement learning algorithm for online accompaniment generation.
The proposed algorithm is able to respond to the human part and generate a melodic, harmonic and diverse machine part.
arXiv Detail & Related papers (2020-02-08T03:53:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.