Timbre latent space: exploration and creative aspects
- URL: http://arxiv.org/abs/2008.01370v2
- Date: Mon, 17 Aug 2020 13:20:40 GMT
- Title: Timbre latent space: exploration and creative aspects
- Authors: Antoine Caillon, Adrien Bitton, Brice Gatinet, Philippe Esling
- Abstract summary: Recent studies show the ability of unsupervised models to learn invertible audio representations using Auto-Encoders.
New possibilities for timbre manipulations are enabled with generative neural networks.
- Score: 1.3764085113103222
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies show the ability of unsupervised models to learn invertible
audio representations using Auto-Encoders. They enable high-quality sound
synthesis but a limited control since the latent spaces do not disentangle
timbre properties. The emergence of disentangled representations was studied in
Variational Auto-Encoders (VAEs), and has been applied to audio. Using an
additional perceptual regularization can align such latent representation with
the previously established multi-dimensional timbre spaces, while allowing
continuous inference and synthesis. Alternatively, some specific sound
attributes can be learned as control variables while unsupervised dimensions
account for the remaining features. New possibilities for timbre manipulations
are enabled with generative neural networks, although the exploration and the
creative use of their representations remain little. The following experiments
are led in cooperation with two composers and propose new creative directions
to explore latent sound synthesis of musical timbres, using specifically
designed interfaces (Max/MSP, Pure Data) or mappings for descriptor-based
synthesis.
Related papers
- Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt [50.25271407721519]
We propose Prompt-Singer, the first SVS method that enables attribute controlling on singer gender, vocal range and volume with natural language.
We adopt a model architecture based on a decoder-only transformer with a multi-scale hierarchy, and design a range-melody decoupled pitch representation.
Experiments show that our model achieves favorable controlling ability and audio quality.
arXiv Detail & Related papers (2024-03-18T13:39:05Z) - Synthia's Melody: A Benchmark Framework for Unsupervised Domain
Adaptation in Audio [4.537310370334197]
We present Synthia's melody, a novel audio data generation framework capable of simulating an infinite variety of 4-second melodies.
Unlike existing datasets collected under observational settings, Synthia's melody is free of unobserved biases.
Our evaluations reveal that Synthia's melody provides a robust testbed for examining the susceptibility of these models to varying levels of distribution shift.
arXiv Detail & Related papers (2023-09-26T15:46:06Z) - Novel-View Acoustic Synthesis [140.1107768313269]
We introduce the novel-view acoustic synthesis (NVAS) task.
given the sight and sound observed at a source viewpoint, can we synthesize the sound of that scene from an unseen target viewpoint?
We propose a neural rendering approach: Visually-Guided Acoustic Synthesis (ViGAS) network that learns to synthesize the sound of an arbitrary point in space.
arXiv Detail & Related papers (2023-01-20T18:49:58Z) - An investigation of the reconstruction capacity of stacked convolutional
autoencoders for log-mel-spectrograms [2.3204178451683264]
In audio processing applications, the generation of expressive sounds based on high-level representations demonstrates a high demand.
Modern algorithms, such as neural networks, have inspired the development of expressive synthesizers based on musical instrument compression.
This study investigates the use of stacked convolutional autoencoders for the compression of time-frequency audio representations for a variety of instruments for a single pitch.
arXiv Detail & Related papers (2023-01-18T17:19:04Z) - Synthesizer Preset Interpolation using Transformer Auto-Encoders [4.213427823201119]
We introduce a bimodal auto-encoder neural network, which simultaneously processes presets using multi-head attention blocks, and audio using convolutions.
This model has been tested on a popular frequency modulation synthesizer with more than one hundred parameters.
After training, the proposed model can be integrated into commercial synthesizers for live or sound design tasks.
arXiv Detail & Related papers (2022-10-27T15:20:18Z) - Preliminary study on using vector quantization latent spaces for TTS/VC
systems with consistent performance [55.10864476206503]
We investigate the use of quantized vectors to model the latent linguistic embedding.
By enforcing different policies over the latent spaces in the training, we are able to obtain a latent linguistic embedding.
Our experiments show that the voice cloning system built with vector quantization has only a small degradation in terms of perceptive evaluations.
arXiv Detail & Related papers (2021-06-25T07:51:35Z) - Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence
Modeling [61.351967629600594]
This paper proposes an any-to-many location-relative, sequence-to-sequence (seq2seq), non-parallel voice conversion approach.
In this approach, we combine a bottle-neck feature extractor (BNE) with a seq2seq synthesis module.
Objective and subjective evaluations show that the proposed any-to-many approach has superior voice conversion performance in terms of both naturalness and speaker similarity.
arXiv Detail & Related papers (2020-09-06T13:01:06Z) - Vector-Quantized Timbre Representation [53.828476137089325]
This paper targets a more flexible synthesis of an individual timbre by learning an approximate decomposition of its spectral properties with a set of generative features.
We introduce an auto-encoder with a discrete latent space that is disentangled from loudness in order to learn a quantized representation of a given timbre distribution.
We detail results for translating audio between orchestral instruments and singing voice, as well as transfers from vocal imitations to instruments.
arXiv Detail & Related papers (2020-07-13T12:35:45Z) - VaPar Synth -- A Variational Parametric Model for Audio Synthesis [78.3405844354125]
We present VaPar Synth - a Variational Parametric Synthesizer which utilizes a conditional variational autoencoder (CVAE) trained on a suitable parametric representation.
We demonstrate our proposed model's capabilities via the reconstruction and generation of instrumental tones with flexible control over their pitch.
arXiv Detail & Related papers (2020-03-30T16:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.