White-box Audio VST Effect Programming
- URL: http://arxiv.org/abs/2102.03170v1
- Date: Fri, 5 Feb 2021 13:45:17 GMT
- Title: White-box Audio VST Effect Programming
- Authors: Christopher Mitcheltree and Hideki Koike
- Abstract summary: We propose a white-box, iterative system that provides step-by-step instructions for applying audio effects to change a user's audio signal towards a desired sound.
Our results indicate that our system is consistently able to provide useful feedback for a variety of different audio effects and synthesizer presets.
- Score: 18.35125491671331
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Learning to program an audio production VST plugin is a time consuming
process, usually obtained through inefficient trial and error and only mastered
after extensive user experience. We propose a white-box, iterative system that
provides step-by-step instructions for applying audio effects to change a
user's audio signal towards a desired sound. We apply our system to Xfer
Records Serum: currently one of the most popular and complex VST synthesizers
used by the audio production community. Our results indicate that our system is
consistently able to provide useful feedback for a variety of different audio
effects and synthesizer presets.
Related papers
- Annotation-Free MIDI-to-Audio Synthesis via Concatenative Synthesis and Generative Refinement [0.0]
CoSaRef is a MIDI-to-audio synthesis method that can be developed without MIDI-audio paired datasets.
It first performs concatenative synthesis based on MIDI inputs and then refines the resulting audio into realistic tracks using a diffusion-based deep generative model trained on audio-only datasets.
arXiv Detail & Related papers (2024-10-22T08:01:40Z) - Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models [56.776580717999806]
Real-world applications often involve processing multiple audio streams simultaneously.
We propose the first multi-audio evaluation benchmark that consists of 20 datasets from 11 multi-audio tasks.
We propose a novel multi-audio-LLM (MALLM) to capture audio context among multiple similar audios.
arXiv Detail & Related papers (2024-09-27T12:06:53Z) - Differentiable All-pole Filters for Time-varying Audio Systems [9.089836388818808]
We re-express a time-varying all-pole filter to backpropagate the gradient through itself.
This implementation can be employed within audio systems containing filters with poles for efficient gradient evaluation.
We demonstrate its training efficiency and expressive capabilities for modelling real-world dynamic audio systems on a phaser, time-varying subtractive synthesiser, and compressor.
arXiv Detail & Related papers (2024-04-11T17:55:05Z) - SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis [9.118448725265669]
One of the most time-consuming steps when designing sound is synchronizing audio with video.
In video games and animations, no reference audio exists, requiring manual annotation of event timings from the video.
We propose a system to extract repetitive actions onsets from a video, which are then used to condition a diffusion model trained to generate a new synchronized sound effects audio track.
arXiv Detail & Related papers (2023-10-23T18:01:36Z) - Separate Anything You Describe [55.0784713558149]
Language-queried audio source separation (LASS) is a new paradigm for computational auditory scene analysis (CASA)
AudioSep is a foundation model for open-domain audio source separation with natural language queries.
arXiv Detail & Related papers (2023-08-09T16:09:44Z) - Large-scale unsupervised audio pre-training for video-to-speech
synthesis [64.86087257004883]
Video-to-speech synthesis is the task of reconstructing the speech signal from a silent video of a speaker.
In this paper we propose to train encoder-decoder models on more than 3,500 hours of audio data at 24kHz.
We then use the pre-trained decoders to initialize the audio decoders for the video-to-speech synthesis task.
arXiv Detail & Related papers (2023-06-27T13:31:33Z) - Make-A-Voice: Unified Voice Synthesis With Discrete Representation [77.3998611565557]
Make-A-Voice is a unified framework for synthesizing and manipulating voice signals from discrete representations.
We show that Make-A-Voice exhibits superior audio quality and style similarity compared with competitive baseline models.
arXiv Detail & Related papers (2023-05-30T17:59:26Z) - Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion
Models [65.18102159618631]
multimodal generative modeling has created milestones in text-to-image and text-to-video generation.
Its application to audio still lags behind for two main reasons: the lack of large-scale datasets with high-quality text-audio pairs, and the complexity of modeling long continuous audio data.
We propose Make-An-Audio with a prompt-enhanced diffusion model that addresses these gaps.
arXiv Detail & Related papers (2023-01-30T04:44:34Z) - VarietySound: Timbre-Controllable Video to Sound Generation via
Unsupervised Information Disentanglement [68.42632589736881]
We pose the task of generating sound with a specific timbre given a video input and a reference audio sample.
To solve this task, we disentangle each target sound audio into three components: temporal information, acoustic information, and background information.
Our method can generate high-quality audio samples with good synchronization with events in video and high timbre similarity with the reference audio.
arXiv Detail & Related papers (2022-11-19T11:12:01Z) - DrumGAN VST: A Plugin for Drum Sound Analysis/Synthesis With
Autoencoding Generative Adversarial Networks [0.0]
We present DrumGAN VST, a plugin for synthesizing drum sounds using a Generative Adrial Network.
DrumGAN VST operates on 44.1 kHz sample-rate audio, offers independent and continuous instrument class controls, and features an encoding neural network that maps sounds into the GAN's latent space.
arXiv Detail & Related papers (2022-06-29T15:44:19Z) - SerumRNN: Step by Step Audio VST Effect Programming [18.35125491671331]
SerumRNN is a system that provides step-by-step instructions for applying audio effects to change a user's input audio towards a desired sound.
Our results indicate that SerumRNN is consistently able to provide useful feedback for a variety of different audio effects and synthesizer presets.
arXiv Detail & Related papers (2021-04-08T16:32:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.