Music FaderNets: Controllable Music Generation Based On High-Level
Features via Low-Level Feature Modelling
- URL: http://arxiv.org/abs/2007.15474v1
- Date: Wed, 29 Jul 2020 16:01:45 GMT
- Title: Music FaderNets: Controllable Music Generation Based On High-Level
Features via Low-Level Feature Modelling
- Authors: Hao Hao Tan, Dorien Herremans
- Abstract summary: We present a framework that can learn high-level feature representations with a limited amount of data.
We refer to our proposed framework as Music FaderNets, which is inspired by the fact that low-level attributes can be continuously manipulated.
We demonstrate that the model successfully learns the intrinsic relationship between arousal and its corresponding low-level attributes.
- Score: 5.88864611435337
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High-level musical qualities (such as emotion) are often abstract,
subjective, and hard to quantify. Given these difficulties, it is not easy to
learn good feature representations with supervised learning techniques, either
because of the insufficiency of labels, or the subjectiveness (and hence large
variance) in human-annotated labels. In this paper, we present a framework that
can learn high-level feature representations with a limited amount of data, by
first modelling their corresponding quantifiable low-level attributes. We refer
to our proposed framework as Music FaderNets, which is inspired by the fact
that low-level attributes can be continuously manipulated by separate "sliding
faders" through feature disentanglement and latent regularization techniques.
High-level features are then inferred from the low-level representations
through semi-supervised clustering using Gaussian Mixture Variational
Autoencoders (GM-VAEs). Using arousal as an example of a high-level feature, we
show that the "faders" of our model are disentangled and change linearly w.r.t.
the modelled low-level attributes of the generated output music. Furthermore,
we demonstrate that the model successfully learns the intrinsic relationship
between arousal and its corresponding low-level attributes (rhythm and note
density), with only 1% of the training set being labelled. Finally, using the
learnt high-level feature representations, we explore the application of our
framework in style transfer tasks across different arousal states. The
effectiveness of this approach is verified through a subjective listening test.
Related papers
- Impact of Label Noise on Learning Complex Features [0.5249805590164901]
We show that pretraining promotes learning complex functions and diverse features in the presence of noise.
Our experiments demonstrate that pre-training with noisy labels encourages gradient descent to find alternate minima.
arXiv Detail & Related papers (2024-11-07T09:47:18Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - Exploiting Semantic Attributes for Transductive Zero-Shot Learning [97.61371730534258]
Zero-shot learning aims to recognize unseen classes by generalizing the relation between visual features and semantic attributes learned from the seen classes.
We present a novel transductive ZSL method that produces semantic attributes of the unseen data and imposes them on the generative process.
Experiments on five standard benchmarks show that our method yields state-of-the-art results for zero-shot learning.
arXiv Detail & Related papers (2023-03-17T09:09:48Z) - High-level Feature Guided Decoding for Semantic Segmentation [54.424062794490254]
We propose to use powerful pre-trained high-level features as guidance (HFG) for the upsampler to produce robust results.
Specifically, the high-level features from the backbone are used to train the class tokens, which are then reused by the upsampler for classification.
To push the upper limit of HFG, we introduce a context augmentation encoder (CAE) that can efficiently and effectively operate on the low-resolution high-level feature.
arXiv Detail & Related papers (2023-03-15T14:23:07Z) - Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music
Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions.
We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation.
Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z) - FIGARO: Generating Symbolic Music with Fine-Grained Artistic Control [25.95359681751144]
We propose the self-supervised description-to-sequence task, which allows for fine-grained controllable generation on a global level.
We do so by extracting high-level features about the target sequence and learning the conditional distribution of sequences given the corresponding high-level description in a sequence-to-sequence modelling setup.
By combining learned high level features with domain knowledge, which acts as a strong inductive bias, the model achieves state-of-the-art results in controllable symbolic music generation and generalizes well beyond the training distribution.
arXiv Detail & Related papers (2022-01-26T13:51:19Z) - Generating Lead Sheets with Affect: A Novel Conditional seq2seq
Framework [3.029434408969759]
We present a novel approach for calculating the positivity or negativity of a chord progression within a lead sheet.
Our approach is similar to a Neural Machine Translation (NMT) problem, as we include high-level conditions in the encoder part of the sequence-to-sequence architectures.
The proposed strategy is able to generate lead sheets in a controllable manner, resulting in distributions of musical attributes similar to those of the training dataset.
arXiv Detail & Related papers (2021-04-27T09:04:21Z) - Generalizing Face Forgery Detection with High-frequency Features [63.33397573649408]
Current CNN-based detectors tend to overfit to method-specific color textures and thus fail to generalize.
We propose to utilize the high-frequency noises for face forgery detection.
The first is the multi-scale high-frequency feature extraction module that extracts high-frequency noises at multiple scales.
The second is the residual-guided spatial attention module that guides the low-level RGB feature extractor to concentrate more on forgery traces from a new perspective.
arXiv Detail & Related papers (2021-03-23T08:19:21Z) - COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio
Representations [32.456824945999465]
We propose a method for learning audio representations, aligning the learned latent representations of audio and associated tags.
We evaluate the quality of our embedding model, measuring its performance as a feature extractor on three different tasks.
arXiv Detail & Related papers (2020-06-15T13:17:18Z) - Audio Impairment Recognition Using a Correlation-Based Feature
Representation [85.08880949780894]
We propose a new representation of hand-crafted features that is based on the correlation of feature pairs.
We show superior performance in terms of compact feature dimensionality and improved computational speed in the test stage.
arXiv Detail & Related papers (2020-03-22T13:34:37Z) - Learning Style-Aware Symbolic Music Representations by Adversarial
Autoencoders [9.923470453197657]
We focus on leveraging adversarial regularization as a flexible and natural mean to imbue variational autoencoders with context information.
We introduce the first Music Adversarial Autoencoder (MusAE)
Our model has a higher reconstruction accuracy than state-of-the-art models based on standard variational autoencoders.
arXiv Detail & Related papers (2020-01-15T18:07:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.