Synthesizer Preset Interpolation using Transformer Auto-Encoders
- URL: http://arxiv.org/abs/2210.16984v1
- Date: Thu, 27 Oct 2022 15:20:18 GMT
- Title: Synthesizer Preset Interpolation using Transformer Auto-Encoders
- Authors: Gwendal Le Vaillant, Thierry Dutoit
- Abstract summary: We introduce a bimodal auto-encoder neural network, which simultaneously processes presets using multi-head attention blocks, and audio using convolutions.
This model has been tested on a popular frequency modulation synthesizer with more than one hundred parameters.
After training, the proposed model can be integrated into commercial synthesizers for live or sound design tasks.
- Score: 4.213427823201119
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Sound synthesizers are widespread in modern music production but they
increasingly require expert skills to be mastered. This work focuses on
interpolation between presets, i.e., sets of values of all sound synthesis
parameters, to enable the intuitive creation of new sounds from existing ones.
We introduce a bimodal auto-encoder neural network, which simultaneously
processes presets using multi-head attention blocks, and audio using
convolutions. This model has been tested on a popular frequency modulation
synthesizer with more than one hundred parameters. Experiments have compared
the model to related architectures and methods, and have demonstrated that it
performs smoother interpolations. After training, the proposed model can be
integrated into commercial synthesizers for live interpolation or sound design
tasks.
Related papers
- Synthesizer Sound Matching Using Audio Spectrogram Transformers [2.5944208050492183]
We introduce a synthesizer sound matching model based on the Audio Spectrogram Transformer.
We show that this model can reconstruct parameters of samples generated from a set of 16 parameters.
We also provide audio examples demonstrating the out-of-domain model performance in emulating vocal imitations.
arXiv Detail & Related papers (2024-07-23T16:58:14Z) - DiffMoog: a Differentiable Modular Synthesizer for Sound Matching [48.33168531500444]
DiffMoog is a differentiable modular synthesizer with a comprehensive set of modules typically found in commercial instruments.
Being differentiable, it allows integration into neural networks, enabling automated sound matching.
We introduce an open-source platform that comprises DiffMoog and an end-to-end sound matching framework.
arXiv Detail & Related papers (2024-01-23T08:59:21Z) - Enhancing the vocal range of single-speaker singing voice synthesis with
melody-unsupervised pre-training [82.94349771571642]
This work proposes a melody-unsupervised multi-speaker pre-training method to enhance the vocal range of the single-speaker.
It is the first to introduce a differentiable duration regulator to improve the rhythm naturalness of the synthesized voice.
Experimental results verify that the proposed SVS system outperforms the baseline on both sound quality and naturalness.
arXiv Detail & Related papers (2023-09-01T06:40:41Z) - NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound
Synthesis based on Frequency Modulation [38.00669627261736]
We propose NAS-FM'', which adopts neural architecture search (NAS) to build a differentiable frequency modulation (FM) synthesizer.
Tunable synthesizers with interpretable controls can be developed automatically from sounds without any prior expert knowledge.
arXiv Detail & Related papers (2023-05-22T09:46:10Z) - Multi-instrument Music Synthesis with Spectrogram Diffusion [19.81982315173444]
We focus on a middle ground of neural synthesizers that can generate audio from MIDI sequences with arbitrary combinations of instruments in realtime.
We use a simple two-stage process: MIDI to spectrograms with an encoder-decoder Transformer, then spectrograms to audio with a generative adversarial network (GAN) spectrogram inverter.
We find this to be a promising first step towards interactive and expressive neural synthesis for arbitrary combinations of instruments and notes.
arXiv Detail & Related papers (2022-06-11T03:26:15Z) - BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for
Binaural Audio Synthesis [129.86743102915986]
We formulate the synthesis process from a different perspective by decomposing the audio into a common part.
We propose BinauralGrad, a novel two-stage framework equipped with diffusion models to synthesize them respectively.
Experiment results show that BinauralGrad outperforms the existing baselines by a large margin in terms of both object and subject evaluation metrics.
arXiv Detail & Related papers (2022-05-30T02:09:26Z) - Sound2Synth: Interpreting Sound via FM Synthesizer Parameters Estimation [19.13182347908491]
The problem of estimating a set of parameters configuration that best restore a sound timbre is an important yet complicated problem.
We proposed a multi-modal deep-learning-based pipeline Sound2 Synth, together with a network structure Prime-Dilated Convolution (PDC) specially designed to solve this problem.
Our method achieved not only SOTA but also the first real-world applicable results on Dexed synthesizer, a popular FM synthesizer.
arXiv Detail & Related papers (2022-05-06T06:55:29Z) - Strumming to the Beat: Audio-Conditioned Contrastive Video Textures [112.6140796961121]
We introduce a non-parametric approach for infinite video texture synthesis using a representation learned via contrastive learning.
We take inspiration from Video Textures, which showed that plausible new videos could be generated from a single one by stitching its frames together in a novel yet consistent order.
Our model outperforms baselines on human perceptual scores, can handle a diverse range of input videos, and can combine semantic and audio-visual cues in order to synthesize videos that synchronize well with an audio signal.
arXiv Detail & Related papers (2021-04-06T17:24:57Z) - Synthesizer: Rethinking Self-Attention in Transformer Models [93.08171885200922]
dot product self-attention is central and indispensable to state-of-the-art Transformer models.
This paper investigates the true importance and contribution of the dot product-based self-attention mechanism on the performance of Transformer models.
arXiv Detail & Related papers (2020-05-02T08:16:19Z) - VaPar Synth -- A Variational Parametric Model for Audio Synthesis [78.3405844354125]
We present VaPar Synth - a Variational Parametric Synthesizer which utilizes a conditional variational autoencoder (CVAE) trained on a suitable parametric representation.
We demonstrate our proposed model's capabilities via the reconstruction and generation of instrumental tones with flexible control over their pitch.
arXiv Detail & Related papers (2020-03-30T16:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.