Related papers: WildFX: A DAW-Powered Pipeline for In-the-Wild Audio FX Graph Modeling

WildFX: A DAW-Powered Pipeline for In-the-Wild Audio FX Graph Modeling

URL: http://arxiv.org/abs/2507.10534v2
Date: Thu, 17 Jul 2025 18:06:25 GMT
Title: WildFX: A DAW-Powered Pipeline for In-the-Wild Audio FX Graph Modeling
Authors: Qihui Yang, Taylor Berg-Kirkpatrick, Julian McAuley, Zachary Novack,
Abstract summary: We introduce WildFX, a pipeline containerized with Docker for generating multi-track audio mixing datasets with rich effect graphs.<n>WildFX supports seamless integration of cross-platform commercial plugins or any plugins in the wild, in VST/VLAPST3/LV2/C formats.<n>Experiments demonstrate the pipeline's validity through blind estimation of mixing graphs, plugin/gain parameters, and its ability to bridge AI research with practical DSP demands.
Score: 43.61383132919089
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite rapid progress in end-to-end AI music generation, AI-driven modeling of professional Digital Signal Processing (DSP) workflows remains challenging. In particular, while there is growing interest in neural black-box modeling of audio effect graphs (e.g. reverb, compression, equalization), AI-based approaches struggle to replicate the nuanced signal flow and parameter interactions used in professional workflows. Existing differentiable plugin approaches often diverge from real-world tools, exhibiting inferior performance relative to simplified neural controllers under equivalent computational constraints. We introduce WildFX, a pipeline containerized with Docker for generating multi-track audio mixing datasets with rich effect graphs, powered by a professional Digital Audio Workstation (DAW) backend. WildFX supports seamless integration of cross-platform commercial plugins or any plugins in the wild, in VST/VST3/LV2/CLAP formats, enabling structural complexity (e.g., sidechains, crossovers) and achieving efficient parallelized processing. A minimalist metadata interface simplifies project/plugin configuration. Experiments demonstrate the pipeline's validity through blind estimation of mixing graphs, plugin/gain parameters, and its ability to bridge AI research with practical DSP demands. The code is available on: https://github.com/IsaacYQH/WildFX.

Related papers

Learning to Upsample and Upmix Audio in the Latent Domain [13.82572699087732]
Neural audio autoencoders create compact latent representations that preserve perceptually important information.<n>We propose a framework that performs audio processing operations entirely within an autoencoder's latent space.<n>We demonstrate computational efficiency gains of up to 100x while maintaining quality comparable to post-processing on raw audio.
arXiv Detail & Related papers (2025-05-31T19:27:22Z)
$^R$FLAV: Rolling Flow matching for infinite Audio Video generation [5.7858802690354]
Joint audio-video (AV) generation is still a significant challenge in generative AI.<n>We present $R$-FLAV, a novel transformer-based architecture that addresses all the key challenges of AV generation.<n>Our experimental results demonstrate that $R$-FLAV outperforms existing state-of-the-art models in multimodal AV generation tasks.
arXiv Detail & Related papers (2025-03-11T11:18:47Z)
Instance-Aware Graph Prompt Learning [71.26108600288308]
We introduce Instance-Aware Graph Prompt Learning (IA-GPL) in this paper. The process involves generating intermediate prompts for each instance using a lightweight architecture. Experiments conducted on multiple datasets and settings showcase the superior performance of IA-GPL compared to state-of-the-art baselines.
arXiv Detail & Related papers (2024-11-26T18:38:38Z)
Open-Amp: Synthetic Data Framework for Audio Effect Foundation Models [4.569691863088947]
This paper introduces Open-Amp, a synthetic data framework for generating large-scale and diverse audio effects data. Our experiments show that using Open-Amp to train a guitar effects encoder achieves new state-of-the-art results on multiple guitar effects classification tasks.
arXiv Detail & Related papers (2024-11-22T14:27:59Z)
Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching [51.70360630470263]
Video-to-audio (V2A) generation aims to synthesize content-matching audio from silent video.<n>We propose Frieren, a V2A model based on rectified flow matching.<n>Experiments indicate that Frieren achieves state-of-the-art performance in both generation quality and temporal alignment.
arXiv Detail & Related papers (2024-06-01T06:40:22Z)
Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task. A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks. Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z)
ScoreMix: A Scalable Augmentation Strategy for Training GANs with Limited Data [93.06336507035486]
Generative Adversarial Networks (GANs) typically suffer from overfitting when limited training data is available. We present ScoreMix, a novel and scalable data augmentation approach for various image synthesis tasks.
arXiv Detail & Related papers (2022-10-27T02:55:15Z)
Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision. A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive. We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z)
Streamable Neural Audio Synthesis With Non-Causal Convolutions [1.8275108630751844]
We introduce a new method allowing to produce non-causal streaming models. This allows to make any convolutional model compatible with real-time buffer-based processing. We show how our method can be adapted to fit complex architectures with parallel branches.
arXiv Detail & Related papers (2022-04-14T16:00:32Z)
Real-time Timbre Transfer and Sound Synthesis using DDSP [1.7942265700058984]
We present a real-time implementation of the MagentaP library embedded in a virtual synthesizer as a plug-in. We focused on timbre transfer from learned representations of real instruments to arbitrary sound inputs as well as controlling these models by MIDI. We developed a GUI for intuitive high-level controls which can be used for post-processing and manipulating the parameters estimated by the neural network.
arXiv Detail & Related papers (2021-03-12T11:49:51Z)
DDSP: Differentiable Digital Signal Processing [13.448630251745163]
We introduce the Differentiable Digital Signal Processing (DDSP) library, which enables direct integration of classic signal processing elements with deep learning methods. We achieve high-fidelity generation without the need for large autoregressive models or adversarial losses. P enables an interpretable and modular approach to generative modeling, without sacrificing the benefits of deep learning.
arXiv Detail & Related papers (2020-01-14T06:49:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.