Related papers: Can Diffusion Model Achieve Better Performance in Text Generation? Bridging the Gap between Training and Inference!

Can Diffusion Model Achieve Better Performance in Text Generation? Bridging the Gap between Training and Inference!

URL: http://arxiv.org/abs/2305.04465v1
Date: Mon, 8 May 2023 05:32:22 GMT
Title: Can Diffusion Model Achieve Better Performance in Text Generation? Bridging the Gap between Training and Inference!
Authors: Zecheng Tang, Pinzheng Wang, Keyan Zhou, Juntao Li, Ziqiang Cao, Min Zhang
Abstract summary: Diffusion models have been successfully adapted to text generation tasks by mapping the discrete text into the continuous space. There exist nonnegligible gaps between training and inference, owing to the absence of the forward process during inference. We propose two simple yet effective methods to bridge the gaps mentioned above, named Distance Penalty and Adaptive Decay Sampling.
Score: 14.979893207094221
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models have been successfully adapted to text generation tasks by mapping the discrete text into the continuous space. However, there exist nonnegligible gaps between training and inference, owing to the absence of the forward process during inference. Thus, the model only predicts based on the previously generated reverse noise rather than the noise computed by the forward process. Besides, the widely-used downsampling strategy in speeding up the inference will cause the mismatch of diffusion trajectories between training and inference. To understand and mitigate the above two types of training-inference discrepancies, we launch a thorough preliminary study. Based on our observations, we propose two simple yet effective methods to bridge the gaps mentioned above, named Distance Penalty and Adaptive Decay Sampling. Extensive experiments on \textbf{6} generation tasks confirm the superiority of our methods, which can achieve $100\times \rightarrow 200\times$ speedup with better performance.

Related papers

Robust Representation Consistency Model via Contrastive Denoising [83.47584074390842]
randomized smoothing provides theoretical guarantees for certifying robustness against adversarial perturbations. diffusion models have been successfully employed for randomized smoothing to purify noise-perturbed samples. We reformulate the generative modeling task along the diffusion trajectories in pixel space as a discriminative task in the latent space.
arXiv Detail & Related papers (2025-01-22T18:52:06Z)
Fast constrained sampling in pre-trained diffusion models [77.21486516041391]
We propose an algorithm that enables fast and high-quality generation under arbitrary constraints. During inference, we can interchange between gradient updates computed on the noisy image and updates computed on the final, clean image. Our approach produces results that rival or surpass the state-of-the-art training-free inference approaches.
arXiv Detail & Related papers (2024-10-24T14:52:38Z)
Training-free Diffusion Model Alignment with Sampling Demons [15.400553977713914]
We propose an optimization approach, dubbed Demon, to guide the denoising process at inference time without backpropagation through reward functions or model retraining. Our approach works by controlling noise distribution in denoising steps to concentrate density on regions corresponding to high rewards through optimization. Our experiments show that the proposed approach significantly improves the average aesthetics scores text-to-image generation.
arXiv Detail & Related papers (2024-10-08T07:33:49Z)
Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data [74.2507346810066]
Ambient diffusion is a recently proposed framework for training diffusion models using corrupted data. We present the first framework for training diffusion models that provably sample from the uncorrupted distribution given only noisy training data.
arXiv Detail & Related papers (2024-03-20T14:22:12Z)
Unsupervised Discovery of Interpretable Directions in h-space of Pre-trained Diffusion Models [63.1637853118899]
We propose the first unsupervised and learning-based method to identify interpretable directions in h-space of pre-trained diffusion models. We employ a shift control module that works on h-space of pre-trained diffusion models to manipulate a sample into a shifted version of itself. By jointly optimizing them, the model will spontaneously discover disentangled and interpretable directions.
arXiv Detail & Related papers (2023-10-15T18:44:30Z)
DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for Accelerated Seq2Seq Diffusion Models [58.450152413700586]
We introduce a soft absorbing state that facilitates the diffusion model in learning to reconstruct discrete mutations based on the underlying Gaussian space. We employ state-of-the-art ODE solvers within the continuous space to expedite the sampling process. Our proposed method effectively accelerates the training convergence by 4x and generates samples of similar quality 800x faster.
arXiv Detail & Related papers (2023-10-09T15:29:10Z)
Observation-Guided Diffusion Probabilistic Models [41.749374023639156]
We propose a novel diffusion-based image generation method called the observation-guided diffusion probabilistic model (OGDM) Our approach reestablishes the training objective by integrating the guidance of the observation process with the Markov chain. We demonstrate the effectiveness of our training algorithm using diverse inference techniques on strong diffusion model baselines.
arXiv Detail & Related papers (2023-10-06T06:29:06Z)
Single and Few-step Diffusion for Generative Speech Enhancement [18.487296462927034]
Diffusion models have shown promising results in speech enhancement. In this paper, we address these limitations through a two-stage training approach. We show that our proposed method keeps a steady performance and therefore largely outperforms the diffusion baseline in this setting.
arXiv Detail & Related papers (2023-09-18T11:30:58Z)
Diffusion Model for Dense Matching [34.13580888014]
The objective for establishing dense correspondence between paired images consists of two terms: a data term and a prior term. We propose DiffMatch, a novel conditional diffusion-based framework designed to explicitly model both the data and prior terms. Our experimental results demonstrate significant performance improvements of our method over existing approaches.
arXiv Detail & Related papers (2023-05-30T14:58:24Z)
Structural Pruning for Diffusion Models [65.02607075556742]
We present Diff-Pruning, an efficient compression method tailored for learning lightweight diffusion models from pre-existing ones. Our empirical assessment, undertaken across several datasets highlights two primary benefits of our proposed method.
arXiv Detail & Related papers (2023-05-18T12:38:21Z)
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech [63.780196620966905]
We propose ProDiff, on progressive fast diffusion model for high-quality text-to-speech. ProDiff parameterizes the denoising model by directly predicting clean data to avoid distinct quality degradation in accelerating sampling. Our evaluation demonstrates that ProDiff needs only 2 iterations to synthesize high-fidelity mel-spectrograms. ProDiff enables a sampling speed of 24x faster than real-time on a single NVIDIA 2080Ti GPU.
arXiv Detail & Related papers (2022-07-13T17:45:43Z)
InferGrad: Improving Diffusion Models for Vocoder by Considering Inference in Training [33.91980890184044]
InferGrad is a diffusion model for vocoder that incorporates inference process into training. InferGrad achieves better voice quality than the baseline WaveGrad under same conditions.
arXiv Detail & Related papers (2022-02-08T09:40:58Z)
On Sampling-Based Training Criteria for Neural Language Modeling [97.35284042981675]
We consider Monte Carlo sampling, importance sampling, a novel method we call compensated partial summation, and noise contrastive estimation. We show that all these sampling methods can perform equally well, as long as we correct for the intended class posterior probabilities. Experimental results in language modeling and automatic speech recognition on Switchboard and LibriSpeech support our claim.
arXiv Detail & Related papers (2021-04-21T12:55:52Z)
BERT Loses Patience: Fast and Robust Inference with Early Exit [91.26199404912019]
We propose Patience-based Early Exit as a plug-and-play technique to improve the efficiency and robustness of a pretrained language model. Our approach improves inference efficiency as it allows the model to make a prediction with fewer layers.
arXiv Detail & Related papers (2020-06-07T13:38:32Z)
Bridging the Gap Between Training and Inference for Spatio-Temporal Forecasting [16.06369357595426]
We propose a novel curriculum learning based strategy named Temporal Progressive Growing Sampling to bridge the gap between training and inference for S-temporal sequence forecasting. Experimental results demonstrate that our proposed method better models long term dependencies and outperforms baseline approaches on two competitive datasets.
arXiv Detail & Related papers (2020-05-19T10:14:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.