DiffPhase: Generative Diffusion-based STFT Phase Retrieval
- URL: http://arxiv.org/abs/2211.04332v2
- Date: Fri, 2 Jun 2023 08:50:05 GMT
- Title: DiffPhase: Generative Diffusion-based STFT Phase Retrieval
- Authors: Tal Peer, Simon Welker, Timo Gerkmann
- Abstract summary: Diffusion probabilistic models have been recently used in a variety of tasks, including speech enhancement and synthesis.
In this work we build upon previous work in the speech domain, adapting a speech enhancement diffusion model specifically for phase retrieval.
Evaluation using speech quality and intelligibility metrics shows the diffusion approach is well-suited to the phase retrieval task, with performance surpassing both classical and modern methods.
- Score: 15.16865739526702
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion probabilistic models have been recently used in a variety of tasks,
including speech enhancement and synthesis. As a generative approach, diffusion
models have been shown to be especially suitable for imputation problems, where
missing data is generated based on existing data. Phase retrieval is inherently
an imputation problem, where phase information has to be generated based on the
given magnitude. In this work we build upon previous work in the speech domain,
adapting a speech enhancement diffusion model specifically for STFT phase
retrieval. Evaluation using speech quality and intelligibility metrics shows
the diffusion approach is well-suited to the phase retrieval task, with
performance surpassing both classical and modern methods.
Related papers
- Energy-Based Diffusion Language Models for Text Generation [126.23425882687195]
Energy-based Diffusion Language Model (EDLM) is an energy-based model operating at the full sequence level for each diffusion step.
Our framework offers a 1.3$times$ sampling speedup over existing diffusion models.
arXiv Detail & Related papers (2024-10-28T17:25:56Z) - Diffusion-TS: Interpretable Diffusion for General Time Series Generation [6.639630994040322]
Diffusion-TS is a novel diffusion-based framework that generates time series samples of high quality.
We train the model to directly reconstruct the sample instead of the noise in each diffusion step, combining a Fourier-based loss term.
Results show that Diffusion-TS achieves the state-of-the-art results on various realistic analyses of time series.
arXiv Detail & Related papers (2024-03-04T05:39:23Z) - Unsupervised speech enhancement with diffusion-based generative models [0.0]
We introduce an alternative approach that operates in an unsupervised manner, leveraging the generative power of diffusion models.
We develop a posterior sampling methodology for speech enhancement by combining the learnt clean speech prior with a noise model for speech signal inference.
We show promising results compared to a recent variational auto-encoder (VAE)-based unsupervised approach and a state-of-the-art diffusion-based supervised method.
arXiv Detail & Related papers (2023-09-19T09:11:31Z) - Adversarial Training of Denoising Diffusion Model Using Dual
Discriminators for High-Fidelity Multi-Speaker TTS [0.0]
The diffusion model is capable of generating high-quality data through a probabilistic approach.
It suffers from the drawback of slow generation speed due to the requirement of a large number of time steps.
We propose a speech synthesis model with two discriminators: a diffusion discriminator for learning the distribution of the reverse process and a spectrogram discriminator for learning the distribution of the generated data.
arXiv Detail & Related papers (2023-08-03T07:22:04Z) - Data Augmentation for Seizure Prediction with Generative Diffusion Model [26.967247641926814]
Seizure prediction is of great importance to improve the life of patients.
The severe imbalance problem between preictal and interictal data still poses a great challenge.
Data augmentation is an intuitive way to solve this problem.
We propose a novel data augmentation method with diffusion model called DiffEEG.
arXiv Detail & Related papers (2023-06-14T05:44:53Z) - An Efficient Membership Inference Attack for the Diffusion Model by
Proximal Initialization [58.88327181933151]
In this paper, we propose an efficient query-based membership inference attack (MIA)
Experimental results indicate that the proposed method can achieve competitive performance with only two queries on both discrete-time and continuous-time diffusion models.
To the best of our knowledge, this work is the first to study the robustness of diffusion models to MIA in the text-to-speech task.
arXiv Detail & Related papers (2023-05-26T16:38:48Z) - A Cheaper and Better Diffusion Language Model with Soft-Masked Noise [62.719656543880596]
Masked-Diffuse LM is a novel diffusion model for language modeling, inspired by linguistic features in languages.
Specifically, we design a linguistic-informed forward process which adds corruptions to the text through strategically soft-masking to better noise the textual data.
We demonstrate that our Masked-Diffuse LM can achieve better generation quality than the state-of-the-art diffusion models with better efficiency.
arXiv Detail & Related papers (2023-04-10T17:58:42Z) - StoRM: A Diffusion-based Stochastic Regeneration Model for Speech
Enhancement and Dereverberation [20.262426487434393]
We present a regeneration approach where an estimate given by a predictive model is provided as a guide for further diffusion.
We show that the proposed approach uses the predictive model to remove the vocalizing and breathing artifacts while producing very high quality samples.
arXiv Detail & Related papers (2022-12-22T16:35:42Z) - Diffusion Models in Vision: A Survey [80.82832715884597]
A diffusion model is a deep generative model that is based on two stages, a forward diffusion stage and a reverse diffusion stage.
Diffusion models are widely appreciated for the quality and diversity of the generated samples, despite their known computational burdens.
arXiv Detail & Related papers (2022-09-10T22:00:30Z) - How Much is Enough? A Study on Diffusion Times in Score-based Generative
Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution.
We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z) - Conditional Diffusion Probabilistic Model for Speech Enhancement [101.4893074984667]
We propose a novel speech enhancement algorithm that incorporates characteristics of the observed noisy speech signal into the diffusion and reverse processes.
In our experiments, we demonstrate strong performance of the proposed approach compared to representative generative models.
arXiv Detail & Related papers (2022-02-10T18:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.