BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality
Speech Synthesis
- URL: http://arxiv.org/abs/2203.13508v1
- Date: Fri, 25 Mar 2022 08:53:12 GMT
- Title: BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality
Speech Synthesis
- Authors: Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu
- Abstract summary: Diffusion probabilistic models (DPMs) and their extensions have emerged as competitive generative models yet confront challenges of efficient sampling.
We propose a new bilateral denoising diffusion model that parameterizes both the forward and reverse processes with a schedule network and a score network.
We show that the new surrogate objective can achieve a lower bound of the log marginal likelihood tighter than a conventional surrogate.
- Score: 45.58131296169655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion probabilistic models (DPMs) and their extensions have emerged as
competitive generative models yet confront challenges of efficient sampling. We
propose a new bilateral denoising diffusion model (BDDM) that parameterizes
both the forward and reverse processes with a schedule network and a score
network, which can train with a novel bilateral modeling objective. We show
that the new surrogate objective can achieve a lower bound of the log marginal
likelihood tighter than a conventional surrogate. We also find that BDDM allows
inheriting pre-trained score network parameters from any DPMs and consequently
enables speedy and stable learning of the schedule network and optimization of
a noise schedule for sampling. Our experiments demonstrate that BDDMs can
generate high-fidelity audio samples with as few as three sampling steps.
Moreover, compared to other state-of-the-art diffusion-based neural vocoders,
BDDMs produce comparable or higher quality samples indistinguishable from human
speech, notably with only seven sampling steps (143x faster than WaveGrad and
28.6x faster than DiffWave). We release our code at
https://github.com/tencent-ailab/bddm.
Related papers
- Adversarial Training of Denoising Diffusion Model Using Dual
Discriminators for High-Fidelity Multi-Speaker TTS [0.0]
The diffusion model is capable of generating high-quality data through a probabilistic approach.
It suffers from the drawback of slow generation speed due to the requirement of a large number of time steps.
We propose a speech synthesis model with two discriminators: a diffusion discriminator for learning the distribution of the reverse process and a spectrogram discriminator for learning the distribution of the generated data.
arXiv Detail & Related papers (2023-08-03T07:22:04Z) - Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion [85.54515118077825]
This paper proposes a linear diffusion model (LinDiff) based on an ordinary differential equation to simultaneously reach fast inference and high sample quality.
To reduce computational complexity, LinDiff employs a patch-based processing approach that partitions the input signal into small patches.
Our model can synthesize speech of a quality comparable to that of autoregressive models with faster synthesis speed.
arXiv Detail & Related papers (2023-06-09T07:02:43Z) - An Efficient Membership Inference Attack for the Diffusion Model by
Proximal Initialization [58.88327181933151]
In this paper, we propose an efficient query-based membership inference attack (MIA)
Experimental results indicate that the proposed method can achieve competitive performance with only two queries on both discrete-time and continuous-time diffusion models.
To the best of our knowledge, this work is the first to study the robustness of diffusion models to MIA in the text-to-speech task.
arXiv Detail & Related papers (2023-05-26T16:38:48Z) - Parallel Sampling of Diffusion Models [76.3124029406809]
Diffusion models are powerful generative models but suffer from slow sampling.
We present ParaDiGMS, a novel method to accelerate the sampling of pretrained diffusion models by denoising multiple steps in parallel.
arXiv Detail & Related papers (2023-05-25T17:59:42Z) - Fast Inference in Denoising Diffusion Models via MMD Finetuning [23.779985842891705]
We present MMD-DDM, a novel method for fast sampling of diffusion models.
Our approach is based on the idea of using the Maximum Mean Discrepancy (MMD) to finetune the learned distribution with a given budget of timesteps.
Our findings show that the proposed method is able to produce high-quality samples in a fraction of the time required by widely-used diffusion models.
arXiv Detail & Related papers (2023-01-19T09:48:07Z) - ProDiff: Progressive Fast Diffusion Model For High-Quality
Text-to-Speech [63.780196620966905]
We propose ProDiff, on progressive fast diffusion model for high-quality text-to-speech.
ProDiff parameterizes the denoising model by directly predicting clean data to avoid distinct quality degradation in accelerating sampling.
Our evaluation demonstrates that ProDiff needs only 2 iterations to synthesize high-fidelity mel-spectrograms.
ProDiff enables a sampling speed of 24x faster than real-time on a single NVIDIA 2080Ti GPU.
arXiv Detail & Related papers (2022-07-13T17:45:43Z) - Bilateral Denoising Diffusion Models [34.507876199641665]
Denoising diffusion probabilistic models (DDPMs) have emerged as competitive generative models.
We propose novel bilateral denoising diffusion models (BDDMs) which take significantly fewer steps to generate high-quality samples.
arXiv Detail & Related papers (2021-08-26T13:23:41Z) - Denoising Diffusion Implicit Models [117.03720513930335]
We present denoising diffusion implicit models (DDIMs) for iterative implicit probabilistic models with the same training procedure as DDPMs.
DDIMs can produce high quality samples $10 times$ to $50 times$ faster in terms of wall-clock time compared to DDPMs.
arXiv Detail & Related papers (2020-10-06T06:15:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.