SerumRNN: Step by Step Audio VST Effect Programming
- URL: http://arxiv.org/abs/2104.03876v1
- Date: Thu, 8 Apr 2021 16:32:14 GMT
- Title: SerumRNN: Step by Step Audio VST Effect Programming
- Authors: Christopher Mitcheltree, Hideki Koike
- Abstract summary: SerumRNN is a system that provides step-by-step instructions for applying audio effects to change a user's input audio towards a desired sound.
Our results indicate that SerumRNN is consistently able to provide useful feedback for a variety of different audio effects and synthesizer presets.
- Score: 18.35125491671331
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Learning to program an audio production VST synthesizer is a time consuming
process, usually obtained through inefficient trial and error and only mastered
after years of experience. As an educational and creative tool for sound
designers, we propose SerumRNN: a system that provides step-by-step
instructions for applying audio effects to change a user's input audio towards
a desired sound. We apply our system to Xfer Records Serum: currently one of
the most popular and complex VST synthesizers used by the audio production
community. Our results indicate that SerumRNN is consistently able to provide
useful feedback for a variety of different audio effects and synthesizer
presets. We demonstrate the benefits of using an iterative system and show that
SerumRNN learns to prioritize effects and can discover more efficient effect
order sequences than a variety of baselines.
Related papers
- Robust AI-Synthesized Speech Detection Using Feature Decomposition Learning and Synthesizer Feature Augmentation [52.0893266767733]
We propose a robust deepfake speech detection method that employs feature decomposition to learn synthesizer-independent content features.
To enhance the model's robustness to different synthesizer characteristics, we propose a synthesizer feature augmentation strategy.
arXiv Detail & Related papers (2024-11-14T03:57:21Z) - Synthesizer Sound Matching Using Audio Spectrogram Transformers [2.5944208050492183]
We introduce a synthesizer sound matching model based on the Audio Spectrogram Transformer.
We show that this model can reconstruct parameters of samples generated from a set of 16 parameters.
We also provide audio examples demonstrating the out-of-domain model performance in emulating vocal imitations.
arXiv Detail & Related papers (2024-07-23T16:58:14Z) - Enhancing the vocal range of single-speaker singing voice synthesis with
melody-unsupervised pre-training [82.94349771571642]
This work proposes a melody-unsupervised multi-speaker pre-training method to enhance the vocal range of the single-speaker.
It is the first to introduce a differentiable duration regulator to improve the rhythm naturalness of the synthesized voice.
Experimental results verify that the proposed SVS system outperforms the baseline on both sound quality and naturalness.
arXiv Detail & Related papers (2023-09-01T06:40:41Z) - Large-scale unsupervised audio pre-training for video-to-speech
synthesis [64.86087257004883]
Video-to-speech synthesis is the task of reconstructing the speech signal from a silent video of a speaker.
In this paper we propose to train encoder-decoder models on more than 3,500 hours of audio data at 24kHz.
We then use the pre-trained decoders to initialize the audio decoders for the video-to-speech synthesis task.
arXiv Detail & Related papers (2023-06-27T13:31:33Z) - Make-A-Voice: Unified Voice Synthesis With Discrete Representation [77.3998611565557]
Make-A-Voice is a unified framework for synthesizing and manipulating voice signals from discrete representations.
We show that Make-A-Voice exhibits superior audio quality and style similarity compared with competitive baseline models.
arXiv Detail & Related papers (2023-05-30T17:59:26Z) - LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders [53.30016986953206]
We propose LA-VocE, a new two-stage approach that predicts mel-spectrograms from noisy audio-visual speech via a transformer-based architecture.
We train and evaluate our framework on thousands of speakers and 11+ different languages, and study our model's ability to adapt to different levels of background noise and speech interference.
arXiv Detail & Related papers (2022-11-20T15:27:55Z) - Synthesizer Preset Interpolation using Transformer Auto-Encoders [4.213427823201119]
We introduce a bimodal auto-encoder neural network, which simultaneously processes presets using multi-head attention blocks, and audio using convolutions.
This model has been tested on a popular frequency modulation synthesizer with more than one hundred parameters.
After training, the proposed model can be integrated into commercial synthesizers for live or sound design tasks.
arXiv Detail & Related papers (2022-10-27T15:20:18Z) - DrumGAN VST: A Plugin for Drum Sound Analysis/Synthesis With
Autoencoding Generative Adversarial Networks [0.0]
We present DrumGAN VST, a plugin for synthesizing drum sounds using a Generative Adrial Network.
DrumGAN VST operates on 44.1 kHz sample-rate audio, offers independent and continuous instrument class controls, and features an encoding neural network that maps sounds into the GAN's latent space.
arXiv Detail & Related papers (2022-06-29T15:44:19Z) - Sound2Synth: Interpreting Sound via FM Synthesizer Parameters Estimation [19.13182347908491]
The problem of estimating a set of parameters configuration that best restore a sound timbre is an important yet complicated problem.
We proposed a multi-modal deep-learning-based pipeline Sound2 Synth, together with a network structure Prime-Dilated Convolution (PDC) specially designed to solve this problem.
Our method achieved not only SOTA but also the first real-world applicable results on Dexed synthesizer, a popular FM synthesizer.
arXiv Detail & Related papers (2022-05-06T06:55:29Z) - White-box Audio VST Effect Programming [18.35125491671331]
We propose a white-box, iterative system that provides step-by-step instructions for applying audio effects to change a user's audio signal towards a desired sound.
Our results indicate that our system is consistently able to provide useful feedback for a variety of different audio effects and synthesizer presets.
arXiv Detail & Related papers (2021-02-05T13:45:17Z) - VaPar Synth -- A Variational Parametric Model for Audio Synthesis [78.3405844354125]
We present VaPar Synth - a Variational Parametric Synthesizer which utilizes a conditional variational autoencoder (CVAE) trained on a suitable parametric representation.
We demonstrate our proposed model's capabilities via the reconstruction and generation of instrumental tones with flexible control over their pitch.
arXiv Detail & Related papers (2020-03-30T16:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.