Single and Few-step Diffusion for Generative Speech Enhancement
- URL: http://arxiv.org/abs/2309.09677v2
- Date: Mon, 15 Jan 2024 14:17:47 GMT
- Title: Single and Few-step Diffusion for Generative Speech Enhancement
- Authors: Bunlong Lay, Jean-Marie Lemercier, Julius Richter, Timo Gerkmann
- Abstract summary: Diffusion models have shown promising results in speech enhancement.
In this paper, we address these limitations through a two-stage training approach.
We show that our proposed method keeps a steady performance and therefore largely outperforms the diffusion baseline in this setting.
- Score: 18.487296462927034
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have shown promising results in speech enhancement, using a
task-adapted diffusion process for the conditional generation of clean speech
given a noisy mixture. However, at test time, the neural network used for score
estimation is called multiple times to solve the iterative reverse process.
This results in a slow inference process and causes discretization errors that
accumulate over the sampling trajectory. In this paper, we address these
limitations through a two-stage training approach. In the first stage, we train
the diffusion model the usual way using the generative denoising score matching
loss. In the second stage, we compute the enhanced signal by solving the
reverse process and compare the resulting estimate to the clean speech target
using a predictive loss. We show that using this second training stage enables
achieving the same performance as the baseline model using only 5 function
evaluations instead of 60 function evaluations. While the performance of usual
generative diffusion algorithms drops dramatically when lowering the number of
function evaluations (NFEs) to obtain single-step diffusion, we show that our
proposed method keeps a steady performance and therefore largely outperforms
the diffusion baseline in this setting and also generalizes better than its
predictive counterpart.
Related papers
- ReNoise: Real Image Inversion Through Iterative Noising [62.96073631599749]
We introduce an inversion method with a high quality-to-operation ratio, enhancing reconstruction accuracy without increasing the number of operations.
We evaluate the performance of our ReNoise technique using various sampling algorithms and models, including recent accelerated diffusion models.
arXiv Detail & Related papers (2024-03-21T17:52:08Z) - Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data [74.2507346810066]
Ambient diffusion is a recently proposed framework for training diffusion models using corrupted data.
We present the first framework for training diffusion models that provably sample from the uncorrupted distribution given only noisy training data.
arXiv Detail & Related papers (2024-03-20T14:22:12Z) - Observation-Guided Diffusion Probabilistic Models [41.749374023639156]
We propose a novel diffusion-based image generation method called the observation-guided diffusion probabilistic model (OGDM)
Our approach reestablishes the training objective by integrating the guidance of the observation process with the Markov chain.
We demonstrate the effectiveness of our training algorithm using diverse inference techniques on strong diffusion model baselines.
arXiv Detail & Related papers (2023-10-06T06:29:06Z) - Speeding Up Speech Synthesis In Diffusion Models By Reducing Data Distribution Recovery Steps Via Content Transfer [3.2634122554914002]
Diffusion based vocoders have been criticised for being slow due to the many steps required during sampling.
We propose a setup where the targets are the different outputs of forward process time steps.
We show through extensive evaluation that the proposed technique generates high-fidelity speech in competitive time.
arXiv Detail & Related papers (2023-09-18T10:35:27Z) - Can Diffusion Model Achieve Better Performance in Text Generation?
Bridging the Gap between Training and Inference! [14.979893207094221]
Diffusion models have been successfully adapted to text generation tasks by mapping the discrete text into the continuous space.
There exist nonnegligible gaps between training and inference, owing to the absence of the forward process during inference.
We propose two simple yet effective methods to bridge the gaps mentioned above, named Distance Penalty and Adaptive Decay Sampling.
arXiv Detail & Related papers (2023-05-08T05:32:22Z) - Q-Diffusion: Quantizing Diffusion Models [52.978047249670276]
Post-training quantization (PTQ) is considered a go-to compression method for other tasks.
We propose a novel PTQ method specifically tailored towards the unique multi-timestep pipeline and model architecture.
We show that our proposed method is able to quantize full-precision unconditional diffusion models into 4-bit while maintaining comparable performance.
arXiv Detail & Related papers (2023-02-08T19:38:59Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Speech Enhancement and Dereverberation with Diffusion-based Generative
Models [14.734454356396157]
We present a detailed overview of the diffusion process that is based on a differential equation.
We show that this procedure enables using only 30 diffusion steps to generate high-quality clean speech estimates.
In an extensive cross-dataset evaluation, we show that the improved method can compete with recent discriminative models.
arXiv Detail & Related papers (2022-08-11T13:55:12Z) - How Much is Enough? A Study on Diffusion Times in Score-based Generative
Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution.
We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z) - Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial
Auto-Encoders [137.1060633388405]
Diffusion-based generative models learn how to generate the data by inferring a reverse diffusion chain.
We propose a faster and cheaper approach that adds noise not until the data become pure random noise.
We show that the proposed model can be cast as an adversarial auto-encoder empowered by both the diffusion process and a learnable implicit prior.
arXiv Detail & Related papers (2022-02-19T20:18:49Z) - A Variational Perspective on Diffusion-Based Generative Models and Score
Matching [8.93483643820767]
We derive a variational framework for likelihood estimation for continuous-time generative diffusion.
We show that minimizing the score-matching loss is equivalent to maximizing a lower bound of the likelihood of the plug-in reverse SDE.
arXiv Detail & Related papers (2021-06-05T05:50:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.