FIGARO: Generating Symbolic Music with Fine-Grained Artistic Control
- URL: http://arxiv.org/abs/2201.10936v4
- Date: Thu, 22 Feb 2024 10:34:18 GMT
- Title: FIGARO: Generating Symbolic Music with Fine-Grained Artistic Control
- Authors: Dimitri von R\"utte, Luca Biggio, Yannic Kilcher, Thomas Hofmann
- Abstract summary: We propose the self-supervised description-to-sequence task, which allows for fine-grained controllable generation on a global level.
We do so by extracting high-level features about the target sequence and learning the conditional distribution of sequences given the corresponding high-level description in a sequence-to-sequence modelling setup.
By combining learned high level features with domain knowledge, which acts as a strong inductive bias, the model achieves state-of-the-art results in controllable symbolic music generation and generalizes well beyond the training distribution.
- Score: 25.95359681751144
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generating music with deep neural networks has been an area of active
research in recent years. While the quality of generated samples has been
steadily increasing, most methods are only able to exert minimal control over
the generated sequence, if any. We propose the self-supervised
description-to-sequence task, which allows for fine-grained controllable
generation on a global level. We do so by extracting high-level features about
the target sequence and learning the conditional distribution of sequences
given the corresponding high-level description in a sequence-to-sequence
modelling setup. We train FIGARO (FIne-grained music Generation via
Attention-based, RObust control) by applying description-to-sequence modelling
to symbolic music. By combining learned high level features with domain
knowledge, which acts as a strong inductive bias, the model achieves
state-of-the-art results in controllable symbolic music generation and
generalizes well beyond the training distribution.
Related papers
- MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss [51.85076222868963]
We introduce a pre-training task designed to link control signals directly with corresponding musical tokens.
We then implement a novel counterfactual loss that promotes better alignment between the generated music and the control prompts.
arXiv Detail & Related papers (2024-07-05T08:08:22Z) - Anticipatory Music Transformer [60.15347393822849]
We introduce anticipation: a method for constructing a controllable generative model of a temporal point process.
We focus on infilling control tasks, whereby the controls are a subset of the events themselves.
We train anticipatory infilling models using the large and diverse Lakh MIDI music dataset.
arXiv Detail & Related papers (2023-06-14T16:27:53Z) - SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences.
We formulate sequence generation as an imitation learning (IL) problem.
This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset.
Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - Seq-HyGAN: Sequence Classification via Hypergraph Attention Network [0.0]
Sequence classification has a wide range of real-world applications in different domains, such as genome classification in health and anomaly detection in business.
The lack of explicit features in sequence data makes it difficult for machine learning models.
We propose a novel Hypergraph Attention Network model, namely Seq-HyGAN.
arXiv Detail & Related papers (2023-03-04T11:53:33Z) - Conditional Drums Generation using Compound Word Representations [4.435094091999926]
We tackle the task of conditional drums generation using a novel data encoding scheme inspired by Compound Word representation.
We present a sequence-to-sequence architecture where a Bidirectional Long short-term memory (BiLSTM) receives information about the conditioning parameters.
A Transformer-based Decoder with relative global attention produces the generated drum sequences.
arXiv Detail & Related papers (2022-02-09T13:49:27Z) - Generating Lead Sheets with Affect: A Novel Conditional seq2seq
Framework [3.029434408969759]
We present a novel approach for calculating the positivity or negativity of a chord progression within a lead sheet.
Our approach is similar to a Neural Machine Translation (NMT) problem, as we include high-level conditions in the encoder part of the sequence-to-sequence architectures.
The proposed strategy is able to generate lead sheets in a controllable manner, resulting in distributions of musical attributes similar to those of the training dataset.
arXiv Detail & Related papers (2021-04-27T09:04:21Z) - Conditional Hybrid GAN for Sequence Generation [56.67961004064029]
We propose a novel conditional hybrid GAN (C-Hybrid-GAN) to solve this issue.
We exploit the Gumbel-Softmax technique to approximate the distribution of discrete-valued sequences.
We demonstrate that the proposed C-Hybrid-GAN outperforms the existing methods in context-conditioned discrete-valued sequence generation.
arXiv Detail & Related papers (2020-09-18T03:52:55Z) - Music FaderNets: Controllable Music Generation Based On High-Level
Features via Low-Level Feature Modelling [5.88864611435337]
We present a framework that can learn high-level feature representations with a limited amount of data.
We refer to our proposed framework as Music FaderNets, which is inspired by the fact that low-level attributes can be continuously manipulated.
We demonstrate that the model successfully learns the intrinsic relationship between arousal and its corresponding low-level attributes.
arXiv Detail & Related papers (2020-07-29T16:01:45Z) - Generative Hierarchical Features from Synthesizing Images [65.66756821069124]
We show that learning to synthesize images can bring remarkable hierarchical visual features that are generalizable across a wide range of applications.
The visual feature produced by our encoder, termed as Generative Hierarchical Feature (GH-Feat), has strong transferability to both generative and discriminative tasks.
arXiv Detail & Related papers (2020-07-20T18:04:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.