Learning Style-Aware Symbolic Music Representations by Adversarial
Autoencoders
- URL: http://arxiv.org/abs/2001.05494v2
- Date: Thu, 20 Feb 2020 14:44:50 GMT
- Title: Learning Style-Aware Symbolic Music Representations by Adversarial
Autoencoders
- Authors: Andrea Valenti, Antonio Carta, Davide Bacciu
- Abstract summary: We focus on leveraging adversarial regularization as a flexible and natural mean to imbue variational autoencoders with context information.
We introduce the first Music Adversarial Autoencoder (MusAE)
Our model has a higher reconstruction accuracy than state-of-the-art models based on standard variational autoencoders.
- Score: 9.923470453197657
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the challenging open problem of learning an effective latent space
for symbolic music data in generative music modeling. We focus on leveraging
adversarial regularization as a flexible and natural mean to imbue variational
autoencoders with context information concerning music genre and style. Through
the paper, we show how Gaussian mixtures taking into account music metadata
information can be used as an effective prior for the autoencoder latent space,
introducing the first Music Adversarial Autoencoder (MusAE). The empirical
analysis on a large scale benchmark shows that our model has a higher
reconstruction accuracy than state-of-the-art models based on standard
variational autoencoders. It is also able to create realistic interpolations
between two musical sequences, smoothly changing the dynamics of the different
tracks. Experiments show that the model can organise its latent space
accordingly to low-level properties of the musical pieces, as well as to embed
into the latent variables the high-level genre information injected from the
prior distribution to increase its overall performance. This allows us to
perform changes to the generated pieces in a principled way.
Related papers
- Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence [92.07601770031236]
We investigate semantically meaningful patterns in the attention heads of an encoder-only Transformer architecture.
We find that fixing the attention weights not only accelerates the training process but also enhances the stability of the optimization.
arXiv Detail & Related papers (2024-09-20T07:41:47Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training [74.32603591331718]
We propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training.
Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attain state-of-the-art (SOTA) overall scores.
arXiv Detail & Related papers (2023-05-31T18:27:43Z) - Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music
Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions.
We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation.
Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z) - The Power of Reuse: A Multi-Scale Transformer Model for Structural
Dynamic Segmentation in Symbolic Music Generation [6.0949335132843965]
Symbolic Music Generation relies on the contextual representation capabilities of the generative model.
We propose a multi-scale Transformer, which uses coarse-decoder and fine-decoders to model the contexts at the global and section-level.
Our model is evaluated on two open MIDI datasets, and experiments show that our model outperforms the best contemporary symbolic music generative models.
arXiv Detail & Related papers (2022-05-17T18:48:14Z) - Flat latent manifolds for music improvisation between human and machine [9.571383193449648]
We consider a music-generating algorithm as a counterpart to a human musician, in a setting where reciprocal improvisation is to lead to new experiences.
In the learned model, we generate novel musical sequences by quantification in latent space.
We provide empirical evidence for our method via a set of experiments on music and we deploy our model for an interactive jam session with a professional drummer.
arXiv Detail & Related papers (2022-02-23T09:00:17Z) - Deep Music Information Dynamics [1.6143012623830792]
We introduce a novel framework that combines two parallel streams - a low rate latent representation stream and a higher rate information dynamics derived from the musical data itself.
Motivated by rate-distortion theories of human cognition we propose a framework for exploring possible relations between imaginary anticipations existing in the listener's mind and information dynamics of the musical surface itself.
arXiv Detail & Related papers (2021-02-01T19:59:59Z) - Sequence Generation using Deep Recurrent Networks and Embeddings: A
study case in music [69.2737664640826]
This paper evaluates different types of memory mechanisms (memory cells) and analyses their performance in the field of music composition.
A set of quantitative metrics is presented to evaluate the performance of the proposed architecture automatically.
arXiv Detail & Related papers (2020-12-02T14:19:19Z) - Music FaderNets: Controllable Music Generation Based On High-Level
Features via Low-Level Feature Modelling [5.88864611435337]
We present a framework that can learn high-level feature representations with a limited amount of data.
We refer to our proposed framework as Music FaderNets, which is inspired by the fact that low-level attributes can be continuously manipulated.
We demonstrate that the model successfully learns the intrinsic relationship between arousal and its corresponding low-level attributes.
arXiv Detail & Related papers (2020-07-29T16:01:45Z) - Continuous Melody Generation via Disentangled Short-Term Representations
and Structural Conditions [14.786601824794369]
We present a model for composing melodies given a user specified symbolic scenario combined with a previous music context.
Our model is capable of generating long melodies by regarding 8-beat note sequences as basic units, and shares consistent rhythm pattern structure with another specific song.
Results show that the music generated by our model tends to have salient repetition structures, rich motives, and stable rhythm patterns.
arXiv Detail & Related papers (2020-02-05T06:23:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.