Variance-Preserving-Based Interpolation Diffusion Models for Speech
Enhancement
- URL: http://arxiv.org/abs/2306.08527v2
- Date: Sun, 17 Sep 2023 13:27:18 GMT
- Title: Variance-Preserving-Based Interpolation Diffusion Models for Speech
Enhancement
- Authors: Zilu Guo, Jun Du, Chin-Hui Lee, Yu Gao, Wenbin Zhang
- Abstract summary: We present a framework that encapsulates both the VP- and variance-exploding (VE)-based diffusion methods.
To improve performance and ease model training, we analyze the common difficulties encountered in diffusion models.
We evaluate our model against several methods using a public benchmark to showcase the effectiveness of our approach.
- Score: 53.2171981279647
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The goal of this study is to implement diffusion models for speech
enhancement (SE). The first step is to emphasize the theoretical foundation of
variance-preserving (VP)-based interpolation diffusion under continuous
conditions. Subsequently, we present a more concise framework that encapsulates
both the VP- and variance-exploding (VE)-based interpolation diffusion methods.
We demonstrate that these two methods are special cases of the proposed
framework. Additionally, we provide a practical example of VP-based
interpolation diffusion for the SE task. To improve performance and ease model
training, we analyze the common difficulties encountered in diffusion models
and suggest amenable hyper-parameters. Finally, we evaluate our model against
several methods using a public benchmark to showcase the effectiveness of our
approach
Related papers
- Dual Conditional Diffusion Models for Sequential Recommendation [47.65610320825351]
We propose a discrete-to-continuous sequential recommendation diffusion framework.
Our framework introduces a complete Markov chain to model the transition from the reversed target item representation to the discrete item index.
Building on this framework, we present the Dual Conditional Diffusion Transformer (DCDT) that incorporates the implicit conditional and the explicit conditional for diffusion-based SR.
arXiv Detail & Related papers (2024-10-29T11:51:06Z) - Training-free Diffusion Model Alignment with Sampling Demons [15.400553977713914]
We propose an optimization approach, dubbed Demon, to guide the denoising process at inference time without backpropagation through reward functions or model retraining.
Our approach works by controlling noise distribution in denoising steps to concentrate density on regions corresponding to high rewards through optimization.
To the best of our knowledge, the proposed approach is the first inference-time, backpropagation-free preference alignment method for diffusion models.
arXiv Detail & Related papers (2024-10-08T07:33:49Z) - Diffusion Features to Bridge Domain Gap for Semantic Segmentation [2.8616666231199424]
This paper investigates the approach that leverages the sampling and fusion techniques to harness the features of diffusion models efficiently.
By leveraging the strength of text-to-image generation capability, we introduce a new training framework designed to implicitly learn posterior knowledge from it.
arXiv Detail & Related papers (2024-06-02T15:33:46Z) - Improved off-policy training of diffusion samplers [93.66433483772055]
We study the problem of training diffusion models to sample from a distribution with an unnormalized density or energy function.
We benchmark several diffusion-structured inference methods, including simulation-based variational approaches and off-policy methods.
Our results shed light on the relative advantages of existing algorithms while bringing into question some claims from past work.
arXiv Detail & Related papers (2024-02-07T18:51:49Z) - Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining.
We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z) - Non-Cross Diffusion for Semantic Consistency [12.645444338043934]
We introduce Non-Cross Diffusion', an innovative approach in generative modeling for learning ordinary differential equation (ODE) models.
Our methodology strategically incorporates an ascending dimension of input to effectively connect points sampled from two distributions with uncrossed paths.
arXiv Detail & Related papers (2023-11-30T05:53:39Z) - Improving Transferability of Adversarial Examples via Bayesian Attacks [84.90830931076901]
We introduce a novel extension by incorporating the Bayesian formulation into the model input as well, enabling the joint diversification of both the model input and model parameters.
Our method achieves a new state-of-the-art on transfer-based attacks, improving the average success rate on ImageNet and CIFAR-10 by 19.14% and 2.08%, respectively.
arXiv Detail & Related papers (2023-07-21T03:43:07Z) - Score-based Generative Modeling Through Backward Stochastic Differential
Equations: Inversion and Generation [6.2255027793924285]
The proposed BSDE-based diffusion model represents a novel approach to diffusion modeling, which extends the application of differential equations (SDEs) in machine learning.
We demonstrate the theoretical guarantees of the model, the benefits of using Lipschitz networks for score matching, and its potential applications in various areas such as diffusion inversion, conditional diffusion, and uncertainty quantification.
arXiv Detail & Related papers (2023-04-26T01:15:35Z) - Diffusion Policies as an Expressive Policy Class for Offline
Reinforcement Learning [70.20191211010847]
Offline reinforcement learning (RL) aims to learn an optimal policy using a previously collected static dataset.
We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy.
We show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks.
arXiv Detail & Related papers (2022-08-12T09:54:11Z) - How Much is Enough? A Study on Diffusion Times in Score-based Generative
Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution.
We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.