Reducing the Prior Mismatch of Stochastic Differential Equations for
Diffusion-based Speech Enhancement
- URL: http://arxiv.org/abs/2302.14748v2
- Date: Tue, 30 May 2023 13:05:55 GMT
- Title: Reducing the Prior Mismatch of Stochastic Differential Equations for
Diffusion-based Speech Enhancement
- Authors: Bunlong Lay, Simon Welker, Julius Richter, Timo Gerkmann
- Abstract summary: We propose a forward process based on a Brownian bridge.
We show that such a process leads to a reduction of the mismatch compared to previous diffusion processes.
- Score: 16.09633286837904
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, score-based generative models have been successfully employed for
the task of speech enhancement. A stochastic differential equation is used to
model the iterative forward process, where at each step environmental noise and
white Gaussian noise are added to the clean speech signal. While in limit the
mean of the forward process ends at the noisy mixture, in practice it stops
earlier and thus only at an approximation of the noisy mixture. This results in
a discrepancy between the terminating distribution of the forward process and
the prior used for solving the reverse process at inference. In this paper, we
address this discrepancy and propose a forward process based on a Brownian
bridge. We show that such a process leads to a reduction of the mismatch
compared to previous diffusion processes. More importantly, we show that our
approach improves in objective metrics over the baseline process with only half
of the iteration steps and having one hyperparameter less to tune.
Related papers
- Posterior Transition Modeling for Unsupervised Diffusion-Based Speech Enhancement [26.937216751657697]
We explore unsupervised speech enhancement using diffusion models as expressive generative priors for clean speech.<n>Existing approaches guide the reverse diffusion process using noisy speech through an approximate, noise-perturbed likelihood score.<n>We propose two alternative algorithms that directly model the conditional reverse transition distribution of diffusion states.
arXiv Detail & Related papers (2025-07-03T07:42:02Z) - Divide and Conquer: Heterogeneous Noise Integration for Diffusion-based Adversarial Purification [75.09791002021947]
Existing purification methods aim to disrupt adversarial perturbations by introducing a certain amount of noise through a forward diffusion process, followed by a reverse process to recover clean examples.
This approach is fundamentally flawed as the uniform operation of the forward process compromises normal pixels while attempting to combat adversarial perturbations.
We propose a heterogeneous purification strategy grounded in the interpretability of neural networks.
Our method decisively applies higher-intensity noise to specific pixels that the target model focuses on while the remaining pixels are subjected to only low-intensity noise.
arXiv Detail & Related papers (2025-03-03T11:00:25Z) - Distributional Diffusion Models with Scoring Rules [83.38210785728994]
Diffusion models generate high-quality synthetic data.
generating high-quality outputs requires many discretization steps.
We propose to accomplish sample generation by learning the posterior em distribution of clean data samples.
arXiv Detail & Related papers (2025-02-04T16:59:03Z) - Robust Representation Consistency Model via Contrastive Denoising [83.47584074390842]
randomized smoothing provides theoretical guarantees for certifying robustness against adversarial perturbations.
diffusion models have been successfully employed for randomized smoothing to purify noise-perturbed samples.
We reformulate the generative modeling task along the diffusion trajectories in pixel space as a discriminative task in the latent space.
arXiv Detail & Related papers (2025-01-22T18:52:06Z) - Arbitrary-steps Image Super-resolution via Diffusion Inversion [68.78628844966019]
This study presents a new image super-resolution (SR) technique based on diffusion inversion, aiming at harnessing the rich image priors encapsulated in large pre-trained diffusion models to improve SR performance.
We design a Partial noise Prediction strategy to construct an intermediate state of the diffusion model, which serves as the starting sampling point.
Once trained, this noise predictor can be used to initialize the sampling process partially along the diffusion trajectory, generating the desirable high-resolution result.
arXiv Detail & Related papers (2024-12-12T07:24:13Z) - Enhancing Diffusion Posterior Sampling for Inverse Problems by Integrating Crafted Measurements [45.70011319850862]
Diffusion models have emerged as a powerful foundation model for visual generation.
Current posterior sampling based methods take the measurement into the posterior sampling to infer the distribution of the target data.
We show that high-frequency information can be prematurely introduced during the early stages, which could induce larger posterior estimate errors.
We propose a novel diffusion posterior sampling method DPS-CM, which incorporates a Crafted Measurement.
arXiv Detail & Related papers (2024-11-15T00:06:57Z) - Improving Diffusion Inverse Problem Solving with Decoupled Noise Annealing [84.97865583302244]
We propose a new method called Decoupled Annealing Posterior Sampling (DAPS) that relies on a novel noise annealing process.
DAPS significantly improves sample quality and stability across multiple image restoration tasks.
For example, we achieve a PSNR of 30.72dB on the FFHQ 256 dataset for phase retrieval, which is an improvement of 9.12dB compared to existing methods.
arXiv Detail & Related papers (2024-07-01T17:59:23Z) - Improving Diffusion Models for Inverse Problems Using Optimal Posterior Covariance [52.093434664236014]
Recent diffusion models provide a promising zero-shot solution to noisy linear inverse problems without retraining for specific inverse problems.
Inspired by this finding, we propose to improve recent methods by using more principled covariance determined by maximum likelihood estimation.
arXiv Detail & Related papers (2024-02-03T13:35:39Z) - Towards More Accurate Diffusion Model Acceleration with A Timestep
Aligner [84.97253871387028]
A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed.
We propose a timestep aligner that helps find a more accurate integral direction for a particular interval at the minimum cost.
Experiments show that our plug-in design can be trained efficiently and boost the inference performance of various state-of-the-art acceleration methods.
arXiv Detail & Related papers (2023-10-14T02:19:07Z) - Single and Few-step Diffusion for Generative Speech Enhancement [18.487296462927034]
Diffusion models have shown promising results in speech enhancement.
In this paper, we address these limitations through a two-stage training approach.
We show that our proposed method keeps a steady performance and therefore largely outperforms the diffusion baseline in this setting.
arXiv Detail & Related papers (2023-09-18T11:30:58Z) - Diffusion Models with Deterministic Normalizing Flow Priors [23.212848643552395]
We propose DiNof ($textbfDi$ffusion with $textbfNo$rmalizing $textbff$low priors), a technique that makes use of normalizing flows and diffusion models.
Experiments on standard image generation datasets demonstrate the advantage of the proposed method over existing approaches.
arXiv Detail & Related papers (2023-09-03T21:26:56Z) - First Order Methods with Markovian Noise: from Acceleration to Variational Inequalities [91.46841922915418]
We present a unified approach for the theoretical analysis of first-order variation methods.
Our approach covers both non-linear gradient and strongly Monte Carlo problems.
We provide bounds that match the oracle strongly in the case of convex method optimization problems.
arXiv Detail & Related papers (2023-05-25T11:11:31Z) - Fast and efficient speech enhancement with variational autoencoders [0.0]
Unsupervised speech enhancement based on variational autoencoders has shown promising performance compared with the commonly used supervised methods.
We propose a new approach based on Langevin dynamics that generates multiple sequences of samples and comes with a total variation-based regularization to incorporate temporal correlations of latent vectors.
Our experiments demonstrate that the developed framework makes an effective compromise between computational efficiency and enhancement quality, and outperforms existing methods.
arXiv Detail & Related papers (2022-11-02T09:52:13Z) - Diffusion Posterior Sampling for General Noisy Inverse Problems [50.873313752797124]
We extend diffusion solvers to handle noisy (non)linear inverse problems via approximation of the posterior sampling.
Our method demonstrates that diffusion models can incorporate various measurement noise statistics.
arXiv Detail & Related papers (2022-09-29T11:12:27Z) - Speech Enhancement and Dereverberation with Diffusion-based Generative
Models [14.734454356396157]
We present a detailed overview of the diffusion process that is based on a differential equation.
We show that this procedure enables using only 30 diffusion steps to generate high-quality clean speech estimates.
In an extensive cross-dataset evaluation, we show that the improved method can compete with recent discriminative models.
arXiv Detail & Related papers (2022-08-11T13:55:12Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.