Time-Varying Audio Effect Modeling by End-to-End Adversarial Training
- URL: http://arxiv.org/abs/2512.15313v1
- Date: Wed, 17 Dec 2025 11:04:39 GMT
- Title: Time-Varying Audio Effect Modeling by End-to-End Adversarial Training
- Authors: Yann Bourdin, Pierrick Legrand, Fanny Roche,
- Abstract summary: This paper introduces a Generative Adversarial Network (GAN) framework to model effects using only input-output audio recordings.<n>An initial adversarial phase allows the model to learn the distribution of the modulation behavior without strict phase constraints.<n>A State Prediction Network (SPN) estimates the initial internal states required to synchronize the model with the target.
- Score: 0.6688641196358245
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning has become a standard approach for the modeling of audio effects, yet strictly black-box modeling remains problematic for time-varying systems. Unlike time-invariant effects, training models on devices with internal modulation typically requires the recording or extraction of control signals to ensure the time-alignment required by standard loss functions. This paper introduces a Generative Adversarial Network (GAN) framework to model such effects using only input-output audio recordings, removing the need for modulation signal extraction. We propose a convolutional-recurrent architecture trained via a two-stage strategy: an initial adversarial phase allows the model to learn the distribution of the modulation behavior without strict phase constraints, followed by a supervised fine-tuning phase where a State Prediction Network (SPN) estimates the initial internal states required to synchronize the model with the target. Additionally, a new objective metric based on chirp-train signals is developed to quantify modulation accuracy. Experiments modeling a vintage hardware phaser demonstrate the method's ability to capture time-varying dynamics in a fully black-box context.
Related papers
- Efficient Conditional Generation on Scale-based Visual Autoregressive Models [26.81493253536486]
Efficient Control Model (ECM) is a plug-and-play framework featuring a lightweight control module that introduces control signals via a distributed architecture.<n> ECM refines conditional features using real-time generated tokens, and a shared feed-forward network (FFN) designed to maximize the utilization of its limited capacity.<n>Our method achieves high-fidelity and diverse control over image generation, surpassing existing baselines while significantly improving both training and inference efficiency.
arXiv Detail & Related papers (2025-10-07T06:27:03Z) - Deep Bilinear Koopman Model for Real-Time Vehicle Control in Frenet Frame [0.0]
This paper presents a deep Koopman approach for modeling and control of vehicle dynamics within the curvilinear Frenet frame.<n>The proposed framework uses a deep neural network architecture to simultaneously learn the Koopman operator and its associated invariant subspace from the data.<n>The proposed controller achieved significant reductions in tracking error relative to baseline controllers, confirming its suitability for real-time implementation in embedded autonomous vehicle systems.
arXiv Detail & Related papers (2025-07-16T18:49:44Z) - Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference.
Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable.
We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z) - Data-driven Nonlinear Model Reduction using Koopman Theory: Integrated
Control Form and NMPC Case Study [56.283944756315066]
We propose generic model structures combining delay-coordinate encoding of measurements and full-state decoding to integrate reduced Koopman modeling and state estimation.
A case study demonstrates that our approach provides accurate control models and enables real-time capable nonlinear model predictive control of a high-purity cryogenic distillation column.
arXiv Detail & Related papers (2024-01-09T11:54:54Z) - Adaptive Training Meets Progressive Scaling: Elevating Efficiency in Diffusion Models [52.1809084559048]
We propose a novel two-stage divide-and-conquer training strategy termed TDC Training.<n>It groups timesteps based on task similarity and difficulty, assigning highly customized denoising models to each group, thereby enhancing the performance of diffusion models.<n>While two-stage training avoids the need to train each model separately, the total training cost is even lower than training a single unified denoising model.
arXiv Detail & Related papers (2023-12-20T03:32:58Z) - One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion
Schedule Flaws and Enhancing Low-Frequency Controls [77.42510898755037]
One More Step (OMS) is a compact network that incorporates an additional simple yet effective step during inference.
OMS elevates image fidelity and harmonizes the dichotomy between training and inference, while preserving original model parameters.
Once trained, various pre-trained diffusion models with the same latent domain can share the same OMS module.
arXiv Detail & Related papers (2023-11-27T12:02:42Z) - Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual
Downstream Tasks [55.36987468073152]
This paper proposes a novel Dual-Guided Spatial-Channel-Temporal (DG-SCT) attention mechanism.
The DG-SCT module incorporates trainable cross-modal interaction layers into pre-trained audio-visual encoders.
Our proposed model achieves state-of-the-art results across multiple downstream tasks, including AVE, AVVP, AVS, and AVQA.
arXiv Detail & Related papers (2023-11-09T05:24:20Z) - End-to-End Reinforcement Learning of Koopman Models for Economic Nonlinear Model Predictive Control [45.84205238554709]
We present a method for reinforcement learning of Koopman surrogate models for optimal performance as part of (e)NMPC.
We show that the end-to-end trained models outperform those trained using system identification in (e)NMPC.
arXiv Detail & Related papers (2023-08-03T10:21:53Z) - Differentiable Grey-box Modelling of Phaser Effects using Frame-based
Spectral Processing [21.053861381437827]
This work presents a differentiable digital signal processing approach to modelling phaser effects.
The proposed model processes audio in short frames to implement a time-varying filter in the frequency domain.
We show that the model can be trained to emulate an analog reference device, while retaining interpretable and adjustable parameters.
arXiv Detail & Related papers (2023-06-02T07:53:41Z) - Modelling black-box audio effects with time-varying feature modulation [13.378050193507907]
We show that scaling the width, depth, or dilation factor of existing architectures does not result in satisfactory performance when modelling audio effects such as fuzz and dynamic range compression.
We propose the integration of time-varying feature-wise linear modulation into existing temporal convolutional backbones.
We demonstrate that our approach more accurately captures long-range dependencies for a range of fuzz and compressor implementations across both time and frequency domain metrics.
arXiv Detail & Related papers (2022-11-01T14:41:57Z) - Time-to-Green predictions for fully-actuated signal control systems with
supervised learning [56.66331540599836]
This paper proposes a time series prediction framework using aggregated traffic signal and loop detector data.
We utilize state-of-the-art machine learning models to predict future signal phases' duration.
Results based on an empirical data set from a fully-actuated signal control system in Zurich, Switzerland, show that machine learning models outperform conventional prediction methods.
arXiv Detail & Related papers (2022-08-24T07:50:43Z) - Physics-Inspired Temporal Learning of Quadrotor Dynamics for Accurate
Model Predictive Trajectory Tracking [76.27433308688592]
Accurately modeling quadrotor's system dynamics is critical for guaranteeing agile, safe, and stable navigation.
We present a novel Physics-Inspired Temporal Convolutional Network (PI-TCN) approach to learning quadrotor's system dynamics purely from robot experience.
Our approach combines the expressive power of sparse temporal convolutions and dense feed-forward connections to make accurate system predictions.
arXiv Detail & Related papers (2022-06-07T13:51:35Z) - Active Tuning [0.5801044612920815]
We introduce Active Tuning, a novel paradigm for optimizing the internal dynamics of neural networks (RNNs) on the fly.
In contrast to the conventional sequence-to-imposed mapping scheme, Active Tuning decouples the RNN's recurrent neural activities from the input stream.
We demonstrate the effectiveness of Active Tuning on several time series prediction benchmarks.
arXiv Detail & Related papers (2020-10-02T20:21:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.