Open-Amp: Synthetic Data Framework for Audio Effect Foundation Models
- URL: http://arxiv.org/abs/2411.14972v1
- Date: Fri, 22 Nov 2024 14:27:59 GMT
- Title: Open-Amp: Synthetic Data Framework for Audio Effect Foundation Models
- Authors: Alec Wright, Alistair Carson, Lauri Juvela,
- Abstract summary: This paper introduces Open-Amp, a synthetic data framework for generating large-scale and diverse audio effects data.
Our experiments show that using Open-Amp to train a guitar effects encoder achieves new state-of-the-art results on multiple guitar effects classification tasks.
- Score: 4.569691863088947
- License:
- Abstract: This paper introduces Open-Amp, a synthetic data framework for generating large-scale and diverse audio effects data. Audio effects are relevant to many musical audio processing and Music Information Retrieval (MIR) tasks, such as modelling of analog audio effects, automatic mixing, tone matching and transcription. Existing audio effects datasets are limited in scope, usually including relatively few audio effects processors and a limited amount of input audio signals. Our proposed framework overcomes these issues, by crowdsourcing neural network emulations of guitar amplifiers and effects, created by users of open-source audio effects emulation software. This allows users of Open-Amp complete control over the input signals to be processed by the effects models, as well as providing high-quality emulations of hundreds of devices. Open-Amp can render audio online during training, allowing great flexibility in data augmentation. Our experiments show that using Open-Amp to train a guitar effects encoder achieves new state-of-the-art results on multiple guitar effects classification tasks. Furthermore, we train a one-to-many guitar effects model using Open-Amp, and use it to emulate unseen analog effects via manipulation of its learned latent space, indicating transferability to analog guitar effects data.
Related papers
- Expressive Acoustic Guitar Sound Synthesis with an Instrument-Specific
Input Representation and Diffusion Outpainting [9.812666469580872]
We propose an expressive acoustic guitar sound synthesis model with a customized input representation to the instrument.
We implement the proposed approach using diffusion-based outpainting which can generate audio with long-term consistency.
Our proposed model has higher audio quality than the baseline model and generates more realistic timbre sounds.
arXiv Detail & Related papers (2024-01-24T14:44:01Z) - From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion [84.138804145918]
Deep generative models can generate high-fidelity audio conditioned on various types of representations.
These models are prone to generate audible artifacts when the conditioning is flawed or imperfect.
We propose a high-fidelity multi-band diffusion-based framework that generates any type of audio modality from low-bitrate discrete representations.
arXiv Detail & Related papers (2023-08-02T22:14:29Z) - Large-scale unsupervised audio pre-training for video-to-speech
synthesis [64.86087257004883]
Video-to-speech synthesis is the task of reconstructing the speech signal from a silent video of a speaker.
In this paper we propose to train encoder-decoder models on more than 3,500 hours of audio data at 24kHz.
We then use the pre-trained decoders to initialize the audio decoders for the video-to-speech synthesis task.
arXiv Detail & Related papers (2023-06-27T13:31:33Z) - Modulation Extraction for LFO-driven Audio Effects [5.740770499256802]
We propose a framework capable of extracting arbitrary LFO signals from processed audio across multiple digital audio effects, parameter settings, and instrument configurations.
We show how coupling the extraction model with a simple processing network enables training of end-to-end black-box models of unseen analog or digital LFO-driven audio effects.
We make our code available and provide the trained audio effect models in a real-time VST plugin.
arXiv Detail & Related papers (2023-05-22T17:33:07Z) - Listen2Scene: Interactive material-aware binaural sound propagation for
reconstructed 3D scenes [69.03289331433874]
We present an end-to-end audio rendering approach (Listen2Scene) for virtual reality (VR) and augmented reality (AR) applications.
We propose a novel neural-network-based sound propagation method to generate acoustic effects for 3D models of real environments.
arXiv Detail & Related papers (2023-02-02T04:09:23Z) - Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion
Models [65.18102159618631]
multimodal generative modeling has created milestones in text-to-image and text-to-video generation.
Its application to audio still lags behind for two main reasons: the lack of large-scale datasets with high-quality text-audio pairs, and the complexity of modeling long continuous audio data.
We propose Make-An-Audio with a prompt-enhanced diffusion model that addresses these gaps.
arXiv Detail & Related papers (2023-01-30T04:44:34Z) - AudioGen: Textually Guided Audio Generation [116.57006301417306]
We tackle the problem of generating audio samples conditioned on descriptive text captions.
In this work, we propose AaudioGen, an auto-regressive model that generates audio samples conditioned on text inputs.
arXiv Detail & Related papers (2022-09-30T10:17:05Z) - Removing Distortion Effects in Music Using Deep Neural Networks [12.497836634060569]
This paper focuses on removing distortion and clipping applied to guitar tracks for music production.
It presents a comparative investigation of different deep neural network (DNN) architectures on this task.
We achieve exceptionally good results in distortion removal using DNNs for effects that superimpose the clean signal to the distorted signal.
arXiv Detail & Related papers (2022-02-03T16:26:29Z) - MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical
Modeling [6.256118777336895]
Musical expression requires control of both what notes are played, and how they are performed.
We introduce MIDI-DDSP, a hierarchical model of musical instruments that enables both realistic neural audio synthesis and detailed user control.
We demonstrate that this hierarchy can reconstruct high-fidelity audio, accurately predict performance attributes for a note sequence, independently manipulate the attributes of a given performance, and as a complete system, generate realistic audio from a novel note sequence.
arXiv Detail & Related papers (2021-12-17T04:15:42Z) - Differentiable Signal Processing With Black-Box Audio Effects [44.93154498647659]
We present a data-driven approach to automate audio signal processing by incorporating stateful third-party, audio effects as layers within a deep neural network.
We show that our approach can yield results comparable to a specialized, state-of-the-art commercial solution for music mastering.
arXiv Detail & Related papers (2021-05-11T02:20:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.