Gradient-based Optimisation of Modulation Effects
- URL: http://arxiv.org/abs/2601.04867v1
- Date: Thu, 08 Jan 2026 12:04:41 GMT
- Title: Gradient-based Optimisation of Modulation Effects
- Authors: Alistair Carson, Alec Wright, Stefan Bilbao,
- Abstract summary: We present a framework for modelling flanger, chorus and phaser effects based on differentiable digital signal processing.<n>The model is trained in the time-frequency domain, but at inference operates in the time-domain, requiring zero latency.<n>We show that when trained against analog effects units, sound output from the model is in some cases perceptually indistinguishable from the reference.
- Score: 8.97214437002284
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modulation effects such as phasers, flangers and chorus effects are heavily used in conjunction with the electric guitar. Machine learning based emulation of analog modulation units has been investigated in recent years, but most methods have either been limited to one class of effect or suffer from a high computational cost or latency compared to canonical digital implementations. Here, we build on previous work and present a framework for modelling flanger, chorus and phaser effects based on differentiable digital signal processing. The model is trained in the time-frequency domain, but at inference operates in the time-domain, requiring zero latency. We investigate the challenges associated with gradient-based optimisation of such effects, and show that low-frequency weighting of loss functions avoids convergence to local minima when learning delay times. We show that when trained against analog effects units, sound output from the model is in some cases perceptually indistinguishable from the reference, but challenges still remain for effects with long delay times and feedback.
Related papers
- DropoutTS: Sample-Adaptive Dropout for Robust Time Series Forecasting [59.868414584142336]
DropoutTS is a model-agnostic plugin that shifts the paradigm from "what" to "how much" to learn.<n>It maps noise to adaptive dropout rates - selectively suppressing spurious fluctuations while preserving fine-grained fidelity.
arXiv Detail & Related papers (2026-01-29T13:49:20Z) - Time-Varying Audio Effect Modeling by End-to-End Adversarial Training [0.6688641196358245]
This paper introduces a Generative Adversarial Network (GAN) framework to model effects using only input-output audio recordings.<n>An initial adversarial phase allows the model to learn the distribution of the modulation behavior without strict phase constraints.<n>A State Prediction Network (SPN) estimates the initial internal states required to synchronize the model with the target.
arXiv Detail & Related papers (2025-12-17T11:04:39Z) - Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models [57.49136894315871]
New paradigm of test-time scaling has yielded remarkable breakthroughs in reasoning models and generative vision models.<n>We propose one solution to the problem of integrating test-time scaling knowledge into a model during post-training.<n>We replace reward guided test-time noise optimization in diffusion models with a Noise Hypernetwork that modulates initial input noise.
arXiv Detail & Related papers (2025-08-13T17:33:37Z) - Anti-aliasing of neural distortion effects via model fine tuning [4.751886527142779]
We present a method for reducing aliasing in neural models via a teacher-student fine tuning approach.<n>Our results show that this method significantly suppresses aliasing for both long-short-term-memory networks (LSTM) and temporal convolutional networks (TCN)
arXiv Detail & Related papers (2025-05-16T15:40:33Z) - Comparative Study of State-based Neural Networks for Virtual Analog Audio Effects Modeling [0.0]
We explore the application of recent machine learning advancements for virtual analog modeling.<n>We compare State-Space models and Linear Recurrent Units against the more common LSTM networks.<n>Our metrics aim to assess the models' ability to accurately replicate the signal's energy and frequency contents.
arXiv Detail & Related papers (2024-05-07T08:47:40Z) - Differentiable Grey-box Modelling of Phaser Effects using Frame-based
Spectral Processing [21.053861381437827]
This work presents a differentiable digital signal processing approach to modelling phaser effects.
The proposed model processes audio in short frames to implement a time-varying filter in the frequency domain.
We show that the model can be trained to emulate an analog reference device, while retaining interpretable and adjustable parameters.
arXiv Detail & Related papers (2023-06-02T07:53:41Z) - Modelling black-box audio effects with time-varying feature modulation [13.378050193507907]
We show that scaling the width, depth, or dilation factor of existing architectures does not result in satisfactory performance when modelling audio effects such as fuzz and dynamic range compression.
We propose the integration of time-varying feature-wise linear modulation into existing temporal convolutional backbones.
We demonstrate that our approach more accurately captures long-range dependencies for a range of fuzz and compressor implementations across both time and frequency domain metrics.
arXiv Detail & Related papers (2022-11-01T14:41:57Z) - On Compressing Sequences for Self-Supervised Speech Models [78.62210521316081]
We study fixed-length and variable-length subsampling along the time axis in self-supervised learning.
We find that variable-length subsampling performs particularly well under low frame rates.
If we have access to phonetic boundaries, we find no degradation in performance for an average frame rate as low as 10 Hz.
arXiv Detail & Related papers (2022-10-13T17:10:02Z) - Improving the Performance of Robust Control through Event-Triggered
Learning [74.57758188038375]
We propose an event-triggered learning algorithm that decides when to learn in the face of uncertainty in the LQR problem.
We demonstrate improved performance over a robust controller baseline in a numerical example.
arXiv Detail & Related papers (2022-07-28T17:36:37Z) - Real Time Speech Enhancement in the Waveform Domain [99.02180506016721]
We present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU.
The proposed model is based on an encoder-decoder architecture with skip-connections.
It is capable of removing various kinds of background noise including stationary and non-stationary noises.
arXiv Detail & Related papers (2020-06-23T09:19:13Z) - Exploring Quality and Generalizability in Parameterized Neural Audio
Effects [0.0]
Deep neural networks have shown promise for music audio signal processing applications.
Results to date have tended to be constrained by low sample rates, noise, narrow domains of signal types, and/or lack of parameterized controls.
This work expands on prior research published on modeling nonlinear time-dependent signal processing effects.
arXiv Detail & Related papers (2020-06-10T00:52:08Z) - Audio Impairment Recognition Using a Correlation-Based Feature
Representation [85.08880949780894]
We propose a new representation of hand-crafted features that is based on the correlation of feature pairs.
We show superior performance in terms of compact feature dimensionality and improved computational speed in the test stage.
arXiv Detail & Related papers (2020-03-22T13:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.