Modulation Extraction for LFO-driven Audio Effects
- URL: http://arxiv.org/abs/2305.13262v1
- Date: Mon, 22 May 2023 17:33:07 GMT
- Title: Modulation Extraction for LFO-driven Audio Effects
- Authors: Christopher Mitcheltree, Christian J. Steinmetz, Marco Comunit\`a,
Joshua D. Reiss
- Abstract summary: We propose a framework capable of extracting arbitrary LFO signals from processed audio across multiple digital audio effects, parameter settings, and instrument configurations.
We show how coupling the extraction model with a simple processing network enables training of end-to-end black-box models of unseen analog or digital LFO-driven audio effects.
We make our code available and provide the trained audio effect models in a real-time VST plugin.
- Score: 5.740770499256802
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Low frequency oscillator (LFO) driven audio effects such as phaser, flanger,
and chorus, modify an input signal using time-varying filters and delays,
resulting in characteristic sweeping or widening effects. It has been shown
that these effects can be modeled using neural networks when conditioned with
the ground truth LFO signal. However, in most cases, the LFO signal is not
accessible and measurement from the audio signal is nontrivial, hindering the
modeling process. To address this, we propose a framework capable of extracting
arbitrary LFO signals from processed audio across multiple digital audio
effects, parameter settings, and instrument configurations. Since our system
imposes no restrictions on the LFO signal shape, we demonstrate its ability to
extract quasiperiodic, combined, and distorted modulation signals that are
relevant to effect modeling. Furthermore, we show how coupling the extraction
model with a simple processing network enables training of end-to-end black-box
models of unseen analog or digital LFO-driven audio effects using only dry and
wet audio pairs, overcoming the need to access the audio effect or internal LFO
signal. We make our code available and provide the trained audio effect models
in a real-time VST plugin.
Related papers
- CONMOD: Controllable Neural Frame-based Modulation Effects [6.132272910797383]
We introduce Controllable Neural Frame-based Modulation Effects (CONMOD), a single black-box model which emulates various LFO-driven effects in a frame-wise manner.
The model is capable of learning the continuous embedding space of two distinct phaser effects, enabling us to steer between effects and achieve creative outputs.
arXiv Detail & Related papers (2024-06-20T02:02:54Z) - From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion [84.138804145918]
Deep generative models can generate high-fidelity audio conditioned on various types of representations.
These models are prone to generate audible artifacts when the conditioning is flawed or imperfect.
We propose a high-fidelity multi-band diffusion-based framework that generates any type of audio modality from low-bitrate discrete representations.
arXiv Detail & Related papers (2023-08-02T22:14:29Z) - Differentiable Grey-box Modelling of Phaser Effects using Frame-based
Spectral Processing [21.053861381437827]
This work presents a differentiable digital signal processing approach to modelling phaser effects.
The proposed model processes audio in short frames to implement a time-varying filter in the frequency domain.
We show that the model can be trained to emulate an analog reference device, while retaining interpretable and adjustable parameters.
arXiv Detail & Related papers (2023-06-02T07:53:41Z) - Modelling black-box audio effects with time-varying feature modulation [13.378050193507907]
We show that scaling the width, depth, or dilation factor of existing architectures does not result in satisfactory performance when modelling audio effects such as fuzz and dynamic range compression.
We propose the integration of time-varying feature-wise linear modulation into existing temporal convolutional backbones.
We demonstrate that our approach more accurately captures long-range dependencies for a range of fuzz and compressor implementations across both time and frequency domain metrics.
arXiv Detail & Related papers (2022-11-01T14:41:57Z) - Simple Pooling Front-ends For Efficient Audio Classification [56.59107110017436]
We show that eliminating the temporal redundancy in the input audio features could be an effective approach for efficient audio classification.
We propose a family of simple pooling front-ends (SimPFs) which use simple non-parametric pooling operations to reduce the redundant information.
SimPFs can achieve a reduction in more than half of the number of floating point operations for off-the-shelf audio neural networks.
arXiv Detail & Related papers (2022-10-03T14:00:41Z) - SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with
Adaptive Noise Spectral Shaping [51.698273019061645]
SpecGrad adapts the diffusion noise so that its time-varying spectral envelope becomes close to the conditioning log-mel spectrogram.
It is processed in the time-frequency domain to keep the computational cost almost the same as the conventional DDPM-based neural vocoders.
arXiv Detail & Related papers (2022-03-31T02:08:27Z) - NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband
Excitation for Noise-Controllable Waveform Generation [67.96138567288197]
We propose a novel neural vocoder named NeuralDPS which can retain high speech quality and acquire high synthesis efficiency and noise controllability.
It generates waveforms at least 280 times faster than the WaveNet vocoder.
It is also 28% faster than WaveGAN's synthesis efficiency on a single core.
arXiv Detail & Related papers (2022-03-05T08:15:29Z) - Point Cloud Audio Processing [18.88427891844357]
We introduce a novel way of processing audio signals by treating them as a collection of points in feature space.
We observe that these methods result in smaller models, and allow us to significantly subsample the input representation with minimal effects to a trained model performance.
arXiv Detail & Related papers (2021-05-06T07:04:59Z) - Hierarchical Timbre-Painting and Articulation Generation [92.59388372914265]
We present a fast and high-fidelity method for music generation, based on specified f0 and loudness.
The synthesized audio mimics the timbre and articulation of a target instrument.
arXiv Detail & Related papers (2020-08-30T05:27:39Z) - Exploring Quality and Generalizability in Parameterized Neural Audio
Effects [0.0]
Deep neural networks have shown promise for music audio signal processing applications.
Results to date have tended to be constrained by low sample rates, noise, narrow domains of signal types, and/or lack of parameterized controls.
This work expands on prior research published on modeling nonlinear time-dependent signal processing effects.
arXiv Detail & Related papers (2020-06-10T00:52:08Z) - VaPar Synth -- A Variational Parametric Model for Audio Synthesis [78.3405844354125]
We present VaPar Synth - a Variational Parametric Synthesizer which utilizes a conditional variational autoencoder (CVAE) trained on a suitable parametric representation.
We demonstrate our proposed model's capabilities via the reconstruction and generation of instrumental tones with flexible control over their pitch.
arXiv Detail & Related papers (2020-03-30T16:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.