Related papers: Exploring Quality and Generalizability in Parameterized Neural Audio Effects

Exploring Quality and Generalizability in Parameterized Neural Audio Effects

URL: http://arxiv.org/abs/2006.05584v1
Date: Wed, 10 Jun 2020 00:52:08 GMT
Title: Exploring Quality and Generalizability in Parameterized Neural Audio Effects
Authors: William Mitchell, Scott H. Hawley
Abstract summary: Deep neural networks have shown promise for music audio signal processing applications. Results to date have tended to be constrained by low sample rates, noise, narrow domains of signal types, and/or lack of parameterized controls. This work expands on prior research published on modeling nonlinear time-dependent signal processing effects.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks have shown promise for music audio signal processing applications, often surpassing prior approaches, particularly as end-to-end models in the waveform domain. Yet results to date have tended to be constrained by low sample rates, noise, narrow domains of signal types, and/or lack of parameterized controls (i.e. "knobs"), making their suitability for professional audio engineering workflows still lacking. This work expands on prior research published on modeling nonlinear time-dependent signal processing effects associated with music production by means of a deep neural network, one which includes the ability to emulate the parameterized settings you would see on an analog piece of equipment, with the goal of eventually producing commercially viable, high quality audio, i.e. 44.1 kHz sampling rate at 16-bit resolution. The results in this paper highlight progress in modeling these effects through architecture and optimization changes, towards increasing computational efficiency, lowering signal-to-noise ratio, and extending to a larger variety of nonlinear audio effects. Toward these ends, the strategies employed involved a three-pronged approach: model speed, model accuracy, and model generalizability. Most of the presented methods provide marginal or no increase in output accuracy over the original model, with the exception of dataset manipulation. We found that limiting the audio content of the dataset, for example using datasets of just a single instrument, provided a significant improvement in model accuracy over models trained on more general datasets.

Related papers

DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis [4.834986020597738]
We propose Deformable Periodic Network based GAN (DPN-GAN)<n>DPN-GAN incorporates a kernel-based periodic ReLU activation function to induce periodic bias in audio generation.<n>We trained two versions of the model: DPN-GAN small (38.67M parameters) and DPN-GAN large (124M parameters)
arXiv Detail & Related papers (2025-05-14T02:52:16Z)
Comparative Study of State-based Neural Networks for Virtual Analog Audio Effects Modeling [0.0]
This article explores the application of machine learning advancements for virtual analog modeling. We compare State-Space models and Linear Recurrent Units against the more common Long Short-Term Memory networks.
arXiv Detail & Related papers (2024-05-07T08:47:40Z)
Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets. We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z)
Diffusion Models for Audio Restoration [22.385385150594185]
We present here audio restoration algorithms based on diffusion models. We show that diffusion models can combine the best of both worlds and offer the opportunity to design audio restoration algorithms. We explain the diffusion formalism and its application to the conditional generation of clean audio signals.
arXiv Detail & Related papers (2024-02-15T09:36:36Z)
From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion [84.138804145918]
Deep generative models can generate high-fidelity audio conditioned on various types of representations. These models are prone to generate audible artifacts when the conditioning is flawed or imperfect. We propose a high-fidelity multi-band diffusion-based framework that generates any type of audio modality from low-bitrate discrete representations.
arXiv Detail & Related papers (2023-08-02T22:14:29Z)
Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion [85.54515118077825]
This paper proposes a linear diffusion model (LinDiff) based on an ordinary differential equation to simultaneously reach fast inference and high sample quality. To reduce computational complexity, LinDiff employs a patch-based processing approach that partitions the input signal into small patches. Our model can synthesize speech of a quality comparable to that of autoregressive models with faster synthesis speed.
arXiv Detail & Related papers (2023-06-09T07:02:43Z)
RAVE: A variational autoencoder for fast and high-quality neural audio synthesis [2.28438857884398]
We introduce a Realtime Audio Variational autoEncoder (RAVE) allowing both fast and high-quality audio waveform synthesis. We show that our model is the first able to generate 48kHz audio signals, while simultaneously running 20 times faster than real-time on a standard laptop CPU.
arXiv Detail & Related papers (2021-11-09T09:07:30Z)
A Study on Speech Enhancement Based on Diffusion Probabilistic Model [63.38586161802788]
We propose a diffusion probabilistic model-based speech enhancement model (DiffuSE) model that aims to recover clean speech signals from noisy signals. The experimental results show that DiffuSE yields performance that is comparable to related audio generative models on the standardized Voice Bank corpus task.
arXiv Detail & Related papers (2021-07-25T19:23:18Z)
WaveGrad: Estimating Gradients for Waveform Generation [55.405580817560754]
WaveGrad is a conditional model for waveform generation which estimates gradients of the data density. It starts from a Gaussian white noise signal and iteratively refines the signal via a gradient-based sampler conditioned on the mel-spectrogram. We find that it can generate high fidelity audio samples using as few as six iterations.
arXiv Detail & Related papers (2020-09-02T17:44:10Z)
Real Time Speech Enhancement in the Waveform Domain [99.02180506016721]
We present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is capable of removing various kinds of background noise including stationary and non-stationary noises.
arXiv Detail & Related papers (2020-06-23T09:19:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.