An Analysis of the Variance of Diffusion-based Speech Enhancement
- URL: http://arxiv.org/abs/2402.00811v2
- Date: Thu, 13 Jun 2024 16:20:59 GMT
- Title: An Analysis of the Variance of Diffusion-based Speech Enhancement
- Authors: Bunlong Lay, Timo Gerkmann,
- Abstract summary: We show that the scale of the variance is a dominant parameter for speech enhancement performance.
We show that a larger variance increases the noise attenuation and allows for reducing the computational footprint.
- Score: 15.736484513462973
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models proved to be powerful models for generative speech enhancement. In recent SGMSE+ approaches, training involves a stochastic differential equation for the diffusion process, adding both Gaussian and environmental noise to the clean speech signal gradually. The speech enhancement performance varies depending on the choice of the stochastic differential equation that controls the evolution of the mean and the variance along the diffusion processes when adding environmental and Gaussian noise. In this work, we highlight that the scale of the variance is a dominant parameter for speech enhancement performance and show that it controls the tradeoff between noise attenuation and speech distortions. More concretely, we show that a larger variance increases the noise attenuation and allows for reducing the computational footprint, as fewer function evaluations for generating the estimate are required
Related papers
- Blue noise for diffusion models [50.99852321110366]
We introduce a novel and general class of diffusion models taking correlated noise within and across images into account.
Our framework allows introducing correlation across images within a single mini-batch to improve gradient flow.
We perform both qualitative and quantitative evaluations on a variety of datasets using our method.
arXiv Detail & Related papers (2024-02-07T14:59:25Z) - Diffusion-based speech enhancement with a weighted generative-supervised
learning loss [0.0]
Diffusion-based generative models have recently gained attention in speech enhancement (SE)
We propose augmenting the original diffusion training objective with a mean squared error (MSE) loss, measuring the discrepancy between estimated enhanced speech and ground-truth clean speech.
arXiv Detail & Related papers (2023-09-19T09:13:35Z) - Unsupervised speech enhancement with diffusion-based generative models [0.0]
We introduce an alternative approach that operates in an unsupervised manner, leveraging the generative power of diffusion models.
We develop a posterior sampling methodology for speech enhancement by combining the learnt clean speech prior with a noise model for speech signal inference.
We show promising results compared to a recent variational auto-encoder (VAE)-based unsupervised approach and a state-of-the-art diffusion-based supervised method.
arXiv Detail & Related papers (2023-09-19T09:11:31Z) - Speech Enhancement and Dereverberation with Diffusion-based Generative
Models [14.734454356396157]
We present a detailed overview of the diffusion process that is based on a differential equation.
We show that this procedure enables using only 30 diffusion steps to generate high-quality clean speech estimates.
In an extensive cross-dataset evaluation, we show that the improved method can compete with recent discriminative models.
arXiv Detail & Related papers (2022-08-11T13:55:12Z) - Diffusion-GAN: Training GANs with Diffusion [135.24433011977874]
Generative adversarial networks (GANs) are challenging to train stably.
We propose Diffusion-GAN, a novel GAN framework that leverages a forward diffusion chain to generate instance noise.
We show that Diffusion-GAN can produce more realistic images with higher stability and data efficiency than state-of-the-art GANs.
arXiv Detail & Related papers (2022-06-05T20:45:01Z) - Conditional Diffusion Probabilistic Model for Speech Enhancement [101.4893074984667]
We propose a novel speech enhancement algorithm that incorporates characteristics of the observed noisy speech signal into the diffusion and reverse processes.
In our experiments, we demonstrate strong performance of the proposed approach compared to representative generative models.
arXiv Detail & Related papers (2022-02-10T18:58:01Z) - Denoising Diffusion Gamma Models [91.22679787578438]
We introduce the Denoising Diffusion Gamma Model (DDGM) and show that noise from Gamma distribution provides improved results for image and speech generation.
Our approach preserves the ability to efficiently sample state in the training diffusion process while using Gamma noise.
arXiv Detail & Related papers (2021-10-10T10:46:31Z) - A Study on Speech Enhancement Based on Diffusion Probabilistic Model [63.38586161802788]
We propose a diffusion probabilistic model-based speech enhancement model (DiffuSE) model that aims to recover clean speech signals from noisy signals.
The experimental results show that DiffuSE yields performance that is comparable to related audio generative models on the standardized Voice Bank corpus task.
arXiv Detail & Related papers (2021-07-25T19:23:18Z) - Non Gaussian Denoising Diffusion Models [91.22679787578438]
We show that noise from Gamma distribution provides improved results for image and speech generation.
We also show that using a mixture of Gaussian noise variables in the diffusion process improves the performance over a diffusion process that is based on a single distribution.
arXiv Detail & Related papers (2021-06-14T16:42:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.