GLA-Grad: A Griffin-Lim Extended Waveform Generation Diffusion Model
- URL: http://arxiv.org/abs/2402.15516v1
- Date: Fri, 9 Feb 2024 12:12:52 GMT
- Title: GLA-Grad: A Griffin-Lim Extended Waveform Generation Diffusion Model
- Authors: Haocheng Liu (IP Paris, LTCI, IDS, S2A), Teysir Baoueb (IP Paris,
LTCI, IDS, S2A), Mathieu Fontaine (IP Paris, LTCI, IDS, S2A), Jonathan Le
Roux (MERL), Gael Richard (IP Paris, LTCI, IDS, S2A)
- Abstract summary: We propose GLA-Grad, which consists in introducing a phase recovery algorithm such as the Griffin-Lim algorithm (GLA) at each step of the regular diffusion process.
We show that our algorithm outperforms state-of-the-art diffusion models for speech generation, especially when generating speech for a previously unseen target speaker.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models are receiving a growing interest for a variety of signal
generation tasks such as speech or music synthesis. WaveGrad, for example, is a
successful diffusion model that conditionally uses the mel spectrogram to guide
a diffusion process for the generation of high-fidelity audio. However, such
models face important challenges concerning the noise diffusion process for
training and inference, and they have difficulty generating high-quality speech
for speakers that were not seen during training. With the aim of minimizing the
conditioning error and increasing the efficiency of the noise diffusion
process, we propose in this paper a new scheme called GLA-Grad, which consists
in introducing a phase recovery algorithm such as the Griffin-Lim algorithm
(GLA) at each step of the regular diffusion process. Furthermore, it can be
directly applied to an already-trained waveform generation model, without
additional training or fine-tuning. We show that our algorithm outperforms
state-of-the-art diffusion models for speech generation, especially when
generating speech for a previously unseen target speaker.
Related papers
- Energy-Based Diffusion Language Models for Text Generation [126.23425882687195]
Energy-based Diffusion Language Model (EDLM) is an energy-based model operating at the full sequence level for each diffusion step.
Our framework offers a 1.3$times$ sampling speedup over existing diffusion models.
arXiv Detail & Related papers (2024-10-28T17:25:56Z) - Blue noise for diffusion models [50.99852321110366]
We introduce a novel and general class of diffusion models taking correlated noise within and across images into account.
Our framework allows introducing correlation across images within a single mini-batch to improve gradient flow.
We perform both qualitative and quantitative evaluations on a variety of datasets using our method.
arXiv Detail & Related papers (2024-02-07T14:59:25Z) - SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and
Music Synthesis [0.0]
We introduce SpecDiff-GAN, a neural vocoder based on HiFi-GAN.
We show the merits of our proposed model for speech and music synthesis on several datasets.
arXiv Detail & Related papers (2024-01-30T09:17:57Z) - Investigating the Design Space of Diffusion Models for Speech Enhancement [17.914763947871368]
Diffusion models are a new class of generative models that have shown outstanding performance in image generation literature.
We show that the performance of previous diffusion-based speech enhancement systems cannot be attributed to the progressive transformation between the clean and noisy speech signals.
We also show that a proper choice of preconditioning, training loss weighting, SDE and sampler allows to outperform a popular diffusion-based speech enhancement system.
arXiv Detail & Related papers (2023-12-07T15:40:55Z) - Unsupervised speech enhancement with diffusion-based generative models [0.0]
We introduce an alternative approach that operates in an unsupervised manner, leveraging the generative power of diffusion models.
We develop a posterior sampling methodology for speech enhancement by combining the learnt clean speech prior with a noise model for speech signal inference.
We show promising results compared to a recent variational auto-encoder (VAE)-based unsupervised approach and a state-of-the-art diffusion-based supervised method.
arXiv Detail & Related papers (2023-09-19T09:11:31Z) - From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion [84.138804145918]
Deep generative models can generate high-fidelity audio conditioned on various types of representations.
These models are prone to generate audible artifacts when the conditioning is flawed or imperfect.
We propose a high-fidelity multi-band diffusion-based framework that generates any type of audio modality from low-bitrate discrete representations.
arXiv Detail & Related papers (2023-08-02T22:14:29Z) - Conditional Diffusion Probabilistic Model for Speech Enhancement [101.4893074984667]
We propose a novel speech enhancement algorithm that incorporates characteristics of the observed noisy speech signal into the diffusion and reverse processes.
In our experiments, we demonstrate strong performance of the proposed approach compared to representative generative models.
arXiv Detail & Related papers (2022-02-10T18:58:01Z) - Denoising Diffusion Gamma Models [91.22679787578438]
We introduce the Denoising Diffusion Gamma Model (DDGM) and show that noise from Gamma distribution provides improved results for image and speech generation.
Our approach preserves the ability to efficiently sample state in the training diffusion process while using Gamma noise.
arXiv Detail & Related papers (2021-10-10T10:46:31Z) - A Study on Speech Enhancement Based on Diffusion Probabilistic Model [63.38586161802788]
We propose a diffusion probabilistic model-based speech enhancement model (DiffuSE) model that aims to recover clean speech signals from noisy signals.
The experimental results show that DiffuSE yields performance that is comparable to related audio generative models on the standardized Voice Bank corpus task.
arXiv Detail & Related papers (2021-07-25T19:23:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.