A Cheaper and Better Diffusion Language Model with Soft-Masked Noise
- URL: http://arxiv.org/abs/2304.04746v1
- Date: Mon, 10 Apr 2023 17:58:42 GMT
- Title: A Cheaper and Better Diffusion Language Model with Soft-Masked Noise
- Authors: Jiaao Chen, Aston Zhang, Mu Li, Alex Smola, Diyi Yang
- Abstract summary: Masked-Diffuse LM is a novel diffusion model for language modeling, inspired by linguistic features in languages.
Specifically, we design a linguistic-informed forward process which adds corruptions to the text through strategically soft-masking to better noise the textual data.
We demonstrate that our Masked-Diffuse LM can achieve better generation quality than the state-of-the-art diffusion models with better efficiency.
- Score: 62.719656543880596
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Diffusion models that are based on iterative denoising have been recently
proposed and leveraged in various generation tasks like image generation.
Whereas, as a way inherently built for continuous data, existing diffusion
models still have some limitations in modeling discrete data, e.g., languages.
For example, the generally used Gaussian noise can not handle the discrete
corruption well, and the objectives in continuous spaces fail to be stable for
textual data in the diffusion process especially when the dimension is high. To
alleviate these issues, we introduce a novel diffusion model for language
modeling, Masked-Diffuse LM, with lower training cost and better performances,
inspired by linguistic features in languages. Specifically, we design a
linguistic-informed forward process which adds corruptions to the text through
strategically soft-masking to better noise the textual data. Also, we directly
predict the categorical distribution with cross-entropy loss function in every
diffusion step to connect the continuous space and discrete space in a more
efficient and straightforward way. Through experiments on 5 controlled
generation tasks, we demonstrate that our Masked-Diffuse LM can achieve better
generation quality than the state-of-the-art diffusion models with better
efficiency.
Related papers
- Discrete Copula Diffusion [44.96934660818884]
We identify a fundamental limitation that prevents discrete diffusion models from achieving strong performance with fewer steps.
We introduce a general approach to supplement the missing dependency information by incorporating another deep generative model, termed the copula model.
Our method does not require fine-tuning either the diffusion model or the copula model, yet it enables high-quality sample generation with significantly fewer denoising steps.
arXiv Detail & Related papers (2024-10-02T18:51:38Z) - Simple and Effective Masked Diffusion Language Models [48.68198363304619]
We show that simple masked discrete diffusion is more performant than previously thought.
We apply an effective training recipe that improves the performance of masked diffusion models.
Our objective has a simple form -- it is a mixture of classical masked language modeling losses.
arXiv Detail & Related papers (2024-06-11T17:51:40Z) - LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation? [10.72249123249003]
We revisit diffusion models, highlighting their capacity for holistic context modeling and parallel decoding.
We introduce a novel architecture, LaDiC, which utilizes a split BERT to create a dedicated latent space for captions.
LaDiC achieves state-of-the-art performance for diffusion-based methods on the MS dataset with 38.2 BLEU@4 and 126.2 CIDEr.
arXiv Detail & Related papers (2024-04-16T17:47:16Z) - Language Rectified Flow: Advancing Diffusion Language Generation with Probabilistic Flows [53.31856123113228]
This paper proposes Language Rectified Flow (ours)
Our method is based on the reformulation of the standard probabilistic flow models.
Experiments and ablation studies demonstrate that our method can be general, effective, and beneficial for many NLP tasks.
arXiv Detail & Related papers (2024-03-25T17:58:22Z) - DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct
Speech-to-Speech Translation [10.984745439751489]
We propose a novel diffusion model by applying the diffusion forward process in the textitcontinuous speech representation space.
In this way, we preserve the semantic structure of the continuous speech representation space in the diffusion process and integrate the continuous and discrete diffusion models.
We conduct extensive experiments on the textless direct speech-to-speech translation task, where the proposed method achieves comparable results to the computationally intensive auto-regressive baselines.
arXiv Detail & Related papers (2023-10-26T16:58:14Z) - Denoising Diffusion Bridge Models [54.87947768074036]
Diffusion models are powerful generative models that map noise to data using processes.
For many applications such as image editing, the model input comes from a distribution that is not random noise.
In our work, we propose Denoising Diffusion Bridge Models (DDBMs)
arXiv Detail & Related papers (2023-09-29T03:24:24Z) - TESS: Text-to-Text Self-Conditioned Simplex Diffusion [56.881170312435444]
Text-to-text Self-conditioned Simplex Diffusion employs a new form of self-conditioning, and applies the diffusion process on the logit simplex space rather than the learned embedding space.
We demonstrate that TESS outperforms state-of-the-art non-autoregressive models, requires fewer diffusion steps with minimal drop in performance, and is competitive with pretrained autoregressive sequence-to-sequence models.
arXiv Detail & Related papers (2023-05-15T06:33:45Z) - DiffusionBERT: Improving Generative Masked Language Models with
Diffusion Models [81.84866217721361]
DiffusionBERT is a new generative masked language model based on discrete diffusion models.
We propose a new noise schedule for the forward diffusion process that controls the degree of noise added at each step.
Experiments on unconditional text generation demonstrate that DiffusionBERT achieves significant improvement over existing diffusion models for text.
arXiv Detail & Related papers (2022-11-28T03:25:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.