WildFX: A DAW-Powered Pipeline for In-the-Wild Audio FX Graph Modeling
- URL: http://arxiv.org/abs/2507.10534v2
- Date: Thu, 17 Jul 2025 18:06:25 GMT
- Title: WildFX: A DAW-Powered Pipeline for In-the-Wild Audio FX Graph Modeling
- Authors: Qihui Yang, Taylor Berg-Kirkpatrick, Julian McAuley, Zachary Novack,
- Abstract summary: We introduce WildFX, a pipeline containerized with Docker for generating multi-track audio mixing datasets with rich effect graphs.<n>WildFX supports seamless integration of cross-platform commercial plugins or any plugins in the wild, in VST/VLAPST3/LV2/C formats.<n>Experiments demonstrate the pipeline's validity through blind estimation of mixing graphs, plugin/gain parameters, and its ability to bridge AI research with practical DSP demands.
- Score: 43.61383132919089
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite rapid progress in end-to-end AI music generation, AI-driven modeling of professional Digital Signal Processing (DSP) workflows remains challenging. In particular, while there is growing interest in neural black-box modeling of audio effect graphs (e.g. reverb, compression, equalization), AI-based approaches struggle to replicate the nuanced signal flow and parameter interactions used in professional workflows. Existing differentiable plugin approaches often diverge from real-world tools, exhibiting inferior performance relative to simplified neural controllers under equivalent computational constraints. We introduce WildFX, a pipeline containerized with Docker for generating multi-track audio mixing datasets with rich effect graphs, powered by a professional Digital Audio Workstation (DAW) backend. WildFX supports seamless integration of cross-platform commercial plugins or any plugins in the wild, in VST/VST3/LV2/CLAP formats, enabling structural complexity (e.g., sidechains, crossovers) and achieving efficient parallelized processing. A minimalist metadata interface simplifies project/plugin configuration. Experiments demonstrate the pipeline's validity through blind estimation of mixing graphs, plugin/gain parameters, and its ability to bridge AI research with practical DSP demands. The code is available on: https://github.com/IsaacYQH/WildFX.
Related papers
- Representation-Regularized Convolutional Audio Transformer for Audio Understanding [53.092757178419355]
bootstrapping representations from scratch is computationally expensive, often requiring extensive training to converge.<n>We propose the Convolutional Audio Transformer (CAT), a unified framework designed to address these challenges.
arXiv Detail & Related papers (2026-01-29T12:16:19Z) - EuleroDec: A Complex-Valued RVQ-VAE for Efficient and Robust Audio Coding [18.199202388702144]
Most frequency-domain neural codecs disregard phase information or encode it as two separate real-valued channels, limiting spatial fidelity.<n>This entails the need to introduce adversarial discriminators at the expense of convergence speed and training stability.<n>In this work we introduce an end-to-end complex-valued RVQ-VAE audio that preserves magnitude-phase coupling across the entire analysis-quantization-synthesis pipeline.
arXiv Detail & Related papers (2026-01-24T16:34:07Z) - Triangle Splatting+: Differentiable Rendering with Opaque Triangles [54.18495204764292]
We introduce Triangle Splatting+, which directly optimize triangles within a differentiable splatting framework.<n>Our method surpasses prior splatting approaches in visual fidelity while remaining efficient and fast to training.<n>The resulting semi-connected meshes support downstream applications such as physics-based simulation or interactive walkthroughs.
arXiv Detail & Related papers (2025-09-29T17:43:46Z) - Neutone SDK: An Open Source Framework for Neural Audio Processing [0.8062120534124608]
We introduce the Neutone SDK: an open source framework that streamlines the deployment of PyTorch-based neural audio models.<n>We provide a technical overview of the interfaces needed to accomplish this, as well as the corresponding SDK implementations.<n>We also demonstrate the SDK's versatility across applications such as audio effect emulation, timbre transfer, and sample generation.
arXiv Detail & Related papers (2025-08-12T17:55:08Z) - Learning to Upsample and Upmix Audio in the Latent Domain [13.82572699087732]
Neural audio autoencoders create compact latent representations that preserve perceptually important information.<n>We propose a framework that performs audio processing operations entirely within an autoencoder's latent space.<n>We demonstrate computational efficiency gains of up to 100x while maintaining quality comparable to post-processing on raw audio.
arXiv Detail & Related papers (2025-05-31T19:27:22Z) - $^R$FLAV: Rolling Flow matching for infinite Audio Video generation [5.7858802690354]
Joint audio-video (AV) generation is still a significant challenge in generative AI.<n>We present $R$-FLAV, a novel transformer-based architecture that addresses all the key challenges of AV generation.<n>Our experimental results demonstrate that $R$-FLAV outperforms existing state-of-the-art models in multimodal AV generation tasks.
arXiv Detail & Related papers (2025-03-11T11:18:47Z) - Instance-Aware Graph Prompt Learning [71.26108600288308]
We introduce Instance-Aware Graph Prompt Learning (IA-GPL) in this paper.
The process involves generating intermediate prompts for each instance using a lightweight architecture.
Experiments conducted on multiple datasets and settings showcase the superior performance of IA-GPL compared to state-of-the-art baselines.
arXiv Detail & Related papers (2024-11-26T18:38:38Z) - Open-Amp: Synthetic Data Framework for Audio Effect Foundation Models [4.569691863088947]
This paper introduces Open-Amp, a synthetic data framework for generating large-scale and diverse audio effects data.
Our experiments show that using Open-Amp to train a guitar effects encoder achieves new state-of-the-art results on multiple guitar effects classification tasks.
arXiv Detail & Related papers (2024-11-22T14:27:59Z) - Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching [51.70360630470263]
Video-to-audio (V2A) generation aims to synthesize content-matching audio from silent video.<n>We propose Frieren, a V2A model based on rectified flow matching.<n>Experiments indicate that Frieren achieves state-of-the-art performance in both generation quality and temporal alignment.
arXiv Detail & Related papers (2024-06-01T06:40:22Z) - Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task.
A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks.
Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z) - ScoreMix: A Scalable Augmentation Strategy for Training GANs with
Limited Data [93.06336507035486]
Generative Adversarial Networks (GANs) typically suffer from overfitting when limited training data is available.
We present ScoreMix, a novel and scalable data augmentation approach for various image synthesis tasks.
arXiv Detail & Related papers (2022-10-27T02:55:15Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Streamable Neural Audio Synthesis With Non-Causal Convolutions [1.8275108630751844]
We introduce a new method allowing to produce non-causal streaming models.
This allows to make any convolutional model compatible with real-time buffer-based processing.
We show how our method can be adapted to fit complex architectures with parallel branches.
arXiv Detail & Related papers (2022-04-14T16:00:32Z) - Real-time Timbre Transfer and Sound Synthesis using DDSP [1.7942265700058984]
We present a real-time implementation of the MagentaP library embedded in a virtual synthesizer as a plug-in.
We focused on timbre transfer from learned representations of real instruments to arbitrary sound inputs as well as controlling these models by MIDI.
We developed a GUI for intuitive high-level controls which can be used for post-processing and manipulating the parameters estimated by the neural network.
arXiv Detail & Related papers (2021-03-12T11:49:51Z) - DDSP: Differentiable Digital Signal Processing [13.448630251745163]
We introduce the Differentiable Digital Signal Processing (DDSP) library, which enables direct integration of classic signal processing elements with deep learning methods.
We achieve high-fidelity generation without the need for large autoregressive models or adversarial losses.
P enables an interpretable and modular approach to generative modeling, without sacrificing the benefits of deep learning.
arXiv Detail & Related papers (2020-01-14T06:49:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.