WildFX: A DAW-Powered Pipeline for In-the-Wild Audio FX Graph Modeling
- URL: http://arxiv.org/abs/2507.10534v2
- Date: Thu, 17 Jul 2025 18:06:25 GMT
- Title: WildFX: A DAW-Powered Pipeline for In-the-Wild Audio FX Graph Modeling
- Authors: Qihui Yang, Taylor Berg-Kirkpatrick, Julian McAuley, Zachary Novack,
- Abstract summary: We introduce WildFX, a pipeline containerized with Docker for generating multi-track audio mixing datasets with rich effect graphs.<n>WildFX supports seamless integration of cross-platform commercial plugins or any plugins in the wild, in VST/VLAPST3/LV2/C formats.<n>Experiments demonstrate the pipeline's validity through blind estimation of mixing graphs, plugin/gain parameters, and its ability to bridge AI research with practical DSP demands.
- Score: 43.61383132919089
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite rapid progress in end-to-end AI music generation, AI-driven modeling of professional Digital Signal Processing (DSP) workflows remains challenging. In particular, while there is growing interest in neural black-box modeling of audio effect graphs (e.g. reverb, compression, equalization), AI-based approaches struggle to replicate the nuanced signal flow and parameter interactions used in professional workflows. Existing differentiable plugin approaches often diverge from real-world tools, exhibiting inferior performance relative to simplified neural controllers under equivalent computational constraints. We introduce WildFX, a pipeline containerized with Docker for generating multi-track audio mixing datasets with rich effect graphs, powered by a professional Digital Audio Workstation (DAW) backend. WildFX supports seamless integration of cross-platform commercial plugins or any plugins in the wild, in VST/VST3/LV2/CLAP formats, enabling structural complexity (e.g., sidechains, crossovers) and achieving efficient parallelized processing. A minimalist metadata interface simplifies project/plugin configuration. Experiments demonstrate the pipeline's validity through blind estimation of mixing graphs, plugin/gain parameters, and its ability to bridge AI research with practical DSP demands. The code is available on: https://github.com/IsaacYQH/WildFX.
Related papers
- Learning to Upsample and Upmix Audio in the Latent Domain [13.82572699087732]
Neural audio autoencoders create compact latent representations that preserve perceptually important information.<n>We propose a framework that performs audio processing operations entirely within an autoencoder's latent space.<n>We demonstrate computational efficiency gains of up to 100x while maintaining quality comparable to post-processing on raw audio.
arXiv Detail & Related papers (2025-05-31T19:27:22Z) - $^R$FLAV: Rolling Flow matching for infinite Audio Video generation [5.7858802690354]
Joint audio-video (AV) generation is still a significant challenge in generative AI.<n>We present $R$-FLAV, a novel transformer-based architecture that addresses all the key challenges of AV generation.<n>Our experimental results demonstrate that $R$-FLAV outperforms existing state-of-the-art models in multimodal AV generation tasks.
arXiv Detail & Related papers (2025-03-11T11:18:47Z) - Instance-Aware Graph Prompt Learning [71.26108600288308]
We introduce Instance-Aware Graph Prompt Learning (IA-GPL) in this paper.
The process involves generating intermediate prompts for each instance using a lightweight architecture.
Experiments conducted on multiple datasets and settings showcase the superior performance of IA-GPL compared to state-of-the-art baselines.
arXiv Detail & Related papers (2024-11-26T18:38:38Z) - Open-Amp: Synthetic Data Framework for Audio Effect Foundation Models [4.569691863088947]
This paper introduces Open-Amp, a synthetic data framework for generating large-scale and diverse audio effects data.
Our experiments show that using Open-Amp to train a guitar effects encoder achieves new state-of-the-art results on multiple guitar effects classification tasks.
arXiv Detail & Related papers (2024-11-22T14:27:59Z) - Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching [51.70360630470263]
Video-to-audio (V2A) generation aims to synthesize content-matching audio from silent video.<n>We propose Frieren, a V2A model based on rectified flow matching.<n>Experiments indicate that Frieren achieves state-of-the-art performance in both generation quality and temporal alignment.
arXiv Detail & Related papers (2024-06-01T06:40:22Z) - Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task.
A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks.
Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z) - ScoreMix: A Scalable Augmentation Strategy for Training GANs with
Limited Data [93.06336507035486]
Generative Adversarial Networks (GANs) typically suffer from overfitting when limited training data is available.
We present ScoreMix, a novel and scalable data augmentation approach for various image synthesis tasks.
arXiv Detail & Related papers (2022-10-27T02:55:15Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Streamable Neural Audio Synthesis With Non-Causal Convolutions [1.8275108630751844]
We introduce a new method allowing to produce non-causal streaming models.
This allows to make any convolutional model compatible with real-time buffer-based processing.
We show how our method can be adapted to fit complex architectures with parallel branches.
arXiv Detail & Related papers (2022-04-14T16:00:32Z) - Real-time Timbre Transfer and Sound Synthesis using DDSP [1.7942265700058984]
We present a real-time implementation of the MagentaP library embedded in a virtual synthesizer as a plug-in.
We focused on timbre transfer from learned representations of real instruments to arbitrary sound inputs as well as controlling these models by MIDI.
We developed a GUI for intuitive high-level controls which can be used for post-processing and manipulating the parameters estimated by the neural network.
arXiv Detail & Related papers (2021-03-12T11:49:51Z) - DDSP: Differentiable Digital Signal Processing [13.448630251745163]
We introduce the Differentiable Digital Signal Processing (DDSP) library, which enables direct integration of classic signal processing elements with deep learning methods.
We achieve high-fidelity generation without the need for large autoregressive models or adversarial losses.
P enables an interpretable and modular approach to generative modeling, without sacrificing the benefits of deep learning.
arXiv Detail & Related papers (2020-01-14T06:49:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.