Reducing the Prior Mismatch of Stochastic Differential Equations for
Diffusion-based Speech Enhancement
- URL: http://arxiv.org/abs/2302.14748v2
- Date: Tue, 30 May 2023 13:05:55 GMT
- Title: Reducing the Prior Mismatch of Stochastic Differential Equations for
Diffusion-based Speech Enhancement
- Authors: Bunlong Lay, Simon Welker, Julius Richter, Timo Gerkmann
- Abstract summary: We propose a forward process based on a Brownian bridge.
We show that such a process leads to a reduction of the mismatch compared to previous diffusion processes.
- Score: 16.09633286837904
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, score-based generative models have been successfully employed for
the task of speech enhancement. A stochastic differential equation is used to
model the iterative forward process, where at each step environmental noise and
white Gaussian noise are added to the clean speech signal. While in limit the
mean of the forward process ends at the noisy mixture, in practice it stops
earlier and thus only at an approximation of the noisy mixture. This results in
a discrepancy between the terminating distribution of the forward process and
the prior used for solving the reverse process at inference. In this paper, we
address this discrepancy and propose a forward process based on a Brownian
bridge. We show that such a process leads to a reduction of the mismatch
compared to previous diffusion processes. More importantly, we show that our
approach improves in objective metrics over the baseline process with only half
of the iteration steps and having one hyperparameter less to tune.
Related papers
- Improving Diffusion Inverse Problem Solving with Decoupled Noise Annealing [84.97865583302244]
We propose a new method called Decoupled Annealing Posterior Sampling (DAPS) that relies on a novel noise annealing process.
DAPS significantly improves sample quality and stability across multiple image restoration tasks.
For example, we achieve a PSNR of 30.72dB on the FFHQ 256 dataset for phase retrieval, which is an improvement of 9.12dB compared to existing methods.
arXiv Detail & Related papers (2024-07-01T17:59:23Z) - Improving Diffusion Models for Inverse Problems Using Optimal Posterior Covariance [52.093434664236014]
Recent diffusion models provide a promising zero-shot solution to noisy linear inverse problems without retraining for specific inverse problems.
Inspired by this finding, we propose to improve recent methods by using more principled covariance determined by maximum likelihood estimation.
arXiv Detail & Related papers (2024-02-03T13:35:39Z) - Towards More Accurate Diffusion Model Acceleration with A Timestep
Aligner [84.97253871387028]
A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed.
We propose a timestep aligner that helps find a more accurate integral direction for a particular interval at the minimum cost.
Experiments show that our plug-in design can be trained efficiently and boost the inference performance of various state-of-the-art acceleration methods.
arXiv Detail & Related papers (2023-10-14T02:19:07Z) - Diffusion-based speech enhancement with a weighted generative-supervised
learning loss [0.0]
Diffusion-based generative models have recently gained attention in speech enhancement (SE)
We propose augmenting the original diffusion training objective with a mean squared error (MSE) loss, measuring the discrepancy between estimated enhanced speech and ground-truth clean speech.
arXiv Detail & Related papers (2023-09-19T09:13:35Z) - Single and Few-step Diffusion for Generative Speech Enhancement [18.487296462927034]
Diffusion models have shown promising results in speech enhancement.
In this paper, we address these limitations through a two-stage training approach.
We show that our proposed method keeps a steady performance and therefore largely outperforms the diffusion baseline in this setting.
arXiv Detail & Related papers (2023-09-18T11:30:58Z) - Diffusion Models with Deterministic Normalizing Flow Priors [23.212848643552395]
We propose DiNof ($textbfDi$ffusion with $textbfNo$rmalizing $textbff$low priors), a technique that makes use of normalizing flows and diffusion models.
Experiments on standard image generation datasets demonstrate the advantage of the proposed method over existing approaches.
arXiv Detail & Related papers (2023-09-03T21:26:56Z) - First Order Methods with Markovian Noise: from Acceleration to Variational Inequalities [91.46841922915418]
We present a unified approach for the theoretical analysis of first-order variation methods.
Our approach covers both non-linear gradient and strongly Monte Carlo problems.
We provide bounds that match the oracle strongly in the case of convex method optimization problems.
arXiv Detail & Related papers (2023-05-25T11:11:31Z) - Fast and efficient speech enhancement with variational autoencoders [0.0]
Unsupervised speech enhancement based on variational autoencoders has shown promising performance compared with the commonly used supervised methods.
We propose a new approach based on Langevin dynamics that generates multiple sequences of samples and comes with a total variation-based regularization to incorporate temporal correlations of latent vectors.
Our experiments demonstrate that the developed framework makes an effective compromise between computational efficiency and enhancement quality, and outperforms existing methods.
arXiv Detail & Related papers (2022-11-02T09:52:13Z) - Diffusion Posterior Sampling for General Noisy Inverse Problems [50.873313752797124]
We extend diffusion solvers to handle noisy (non)linear inverse problems via approximation of the posterior sampling.
Our method demonstrates that diffusion models can incorporate various measurement noise statistics.
arXiv Detail & Related papers (2022-09-29T11:12:27Z) - Speech Enhancement and Dereverberation with Diffusion-based Generative
Models [14.734454356396157]
We present a detailed overview of the diffusion process that is based on a differential equation.
We show that this procedure enables using only 30 diffusion steps to generate high-quality clean speech estimates.
In an extensive cross-dataset evaluation, we show that the improved method can compete with recent discriminative models.
arXiv Detail & Related papers (2022-08-11T13:55:12Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.