Flat latent manifolds for music improvisation between human and machine
- URL: http://arxiv.org/abs/2202.12243v1
- Date: Wed, 23 Feb 2022 09:00:17 GMT
- Title: Flat latent manifolds for music improvisation between human and machine
- Authors: Nutan Chen, Djalel Benbouzid, Francesco Ferroni, Mathis Nitschke,
Luciano Pinna, Patrick van der Smagt
- Abstract summary: We consider a music-generating algorithm as a counterpart to a human musician, in a setting where reciprocal improvisation is to lead to new experiences.
In the learned model, we generate novel musical sequences by quantification in latent space.
We provide empirical evidence for our method via a set of experiments on music and we deploy our model for an interactive jam session with a professional drummer.
- Score: 9.571383193449648
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The use of machine learning in artistic music generation leads to
controversial discussions of the quality of art, for which objective
quantification is nonsensical. We therefore consider a music-generating
algorithm as a counterpart to a human musician, in a setting where reciprocal
improvisation is to lead to new experiences, both for the musician and the
audience. To obtain this behaviour, we resort to the framework of recurrent
Variational Auto-Encoders (VAE) and learn to generate music, seeded by a human
musician. In the learned model, we generate novel musical sequences by
interpolation in latent space. Standard VAEs however do not guarantee any form
of smoothness in their latent representation. This translates into abrupt
changes in the generated music sequences. To overcome these limitations, we
regularise the decoder and endow the latent space with a flat Riemannian
manifold, i.e., a manifold that is isometric to the Euclidean space. As a
result, linearly interpolating in the latent space yields realistic and smooth
musical changes that fit the type of machine--musician interactions we aim for.
We provide empirical evidence for our method via a set of experiments on music
datasets and we deploy our model for an interactive jam session with a
professional drummer. The live performance provides qualitative evidence that
the latent representation can be intuitively interpreted and exploited by the
drummer to drive the interplay. Beyond the musical application, our approach
showcases an instance of human-centred design of machine-learning models,
driven by interpretability and the interaction with the end user.
Related papers
- MuDiT & MuSiT: Alignment with Colloquial Expression in Description-to-Song Generation [18.181382408551574]
We propose a novel task of Colloquial Description-to-Song Generation.
It focuses on aligning the generated content with colloquial human expressions.
This task is aimed at bridging the gap between colloquial language understanding and auditory expression within an AI model.
arXiv Detail & Related papers (2024-07-03T15:12:36Z) - MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - A Survey of Music Generation in the Context of Interaction [3.6522809408725223]
Machine learning has been successfully used to compose and generate music, both melodies and polyphonic pieces.
Most of these models are not suitable for human-machine co-creation through live interaction.
arXiv Detail & Related papers (2024-02-23T12:41:44Z) - Generating music with sentiment using Transformer-GANs [0.0]
We propose a generative model of symbolic music conditioned by data retrieved from human sentiment.
We try to tackle both of the problems above by employing an efficient linear version of Attention and using a Discriminator.
arXiv Detail & Related papers (2022-12-21T15:59:35Z) - Quantized GAN for Complex Music Generation from Dance Videos [48.196705493763986]
We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates musical samples conditioned on dance videos.
Our proposed framework takes dance video frames and human body motion as input, and learns to generate music samples that plausibly accompany the corresponding input.
arXiv Detail & Related papers (2022-04-01T17:53:39Z) - Contrastive Learning with Positive-Negative Frame Mask for Music
Representation [91.44187939465948]
This paper proposes a novel Positive-nEgative frame mask for Music Representation based on the contrastive learning framework, abbreviated as PEMR.
We devise a novel contrastive learning objective to accommodate both self-augmented positives/negatives sampled from the same music.
arXiv Detail & Related papers (2022-03-17T07:11:42Z) - Music-to-Dance Generation with Optimal Transport [48.92483627635586]
We propose a Music-to-Dance with Optimal Transport Network (MDOT-Net) for learning to generate 3D dance choreographs from music.
We introduce an optimal transport distance for evaluating the authenticity of the generated dance distribution and a Gromov-Wasserstein distance to measure the correspondence between the dance distribution and the input music.
arXiv Detail & Related papers (2021-12-03T09:37:26Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z) - RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement
Learning [69.20460466735852]
This paper presents a deep reinforcement learning algorithm for online accompaniment generation.
The proposed algorithm is able to respond to the human part and generate a melodic, harmonic and diverse machine part.
arXiv Detail & Related papers (2020-02-08T03:53:52Z) - Learning Style-Aware Symbolic Music Representations by Adversarial
Autoencoders [9.923470453197657]
We focus on leveraging adversarial regularization as a flexible and natural mean to imbue variational autoencoders with context information.
We introduce the first Music Adversarial Autoencoder (MusAE)
Our model has a higher reconstruction accuracy than state-of-the-art models based on standard variational autoencoders.
arXiv Detail & Related papers (2020-01-15T18:07:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.