Likelihood-Based Diffusion Language Models
- URL: http://arxiv.org/abs/2305.18619v1
- Date: Tue, 30 May 2023 16:43:31 GMT
- Title: Likelihood-Based Diffusion Language Models
- Authors: Ishaan Gulrajani, Tatsunori B. Hashimoto
- Abstract summary: We take the first steps towards closing the likelihood gap between autoregressive and diffusion-based language models.
We pursue this goal through algorithmic improvements, scaling laws, and increased compute.
We release Plaid 1B, a large diffusion language model which outperforms GPT-2 124M in likelihood on benchmark datasets.
- Score: 13.916640262862215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite a growing interest in diffusion-based language models, existing work
has not shown that these models can attain nontrivial likelihoods on standard
language modeling benchmarks. In this work, we take the first steps towards
closing the likelihood gap between autoregressive and diffusion-based language
models, with the goal of building and releasing a diffusion model which
outperforms a small but widely-known autoregressive model. We pursue this goal
through algorithmic improvements, scaling laws, and increased compute. On the
algorithmic front, we introduce several methodological improvements for the
maximum-likelihood training of diffusion language models. We then study scaling
laws for our diffusion models and find compute-optimal training regimes which
differ substantially from autoregressive models. Using our methods and scaling
analysis, we train and release Plaid 1B, a large diffusion language model which
outperforms GPT-2 124M in likelihood on benchmark datasets and generates fluent
samples in unconditional and zero-shot control settings.
Related papers
- Energy-Based Diffusion Language Models for Text Generation [126.23425882687195]
Energy-based Diffusion Language Model (EDLM) is an energy-based model operating at the full sequence level for each diffusion step.
Our framework offers a 1.3$times$ sampling speedup over existing diffusion models.
arXiv Detail & Related papers (2024-10-28T17:25:56Z) - Scaling Diffusion Language Models via Adaptation from Autoregressive Models [105.70889434492143]
Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling.
We show that we can convert AR models ranging from 127M to 7B parameters into diffusion models DiffuGPT and DiffuLLaMA, using less than 200B tokens for training.
Our experimental results reveal that these models outperform earlier DLMs and are competitive with their AR counterparts.
arXiv Detail & Related papers (2024-10-23T14:04:22Z) - Simple and Effective Masked Diffusion Language Models [48.68198363304619]
We show that simple masked discrete diffusion is more performant than previously thought.
We apply an effective training recipe that improves the performance of masked diffusion models.
Our objective has a simple form -- it is a mixture of classical masked language modeling losses.
arXiv Detail & Related papers (2024-06-11T17:51:40Z) - Observational Scaling Laws and the Predictability of Language Model Performance [51.2336010244645]
We propose an observational approach that bypasses model training and instead builds scaling laws from 100 publically available models.
We show that several emergent phenomena follow a smooth, sigmoidal behavior and are predictable from small models.
We show how to predict the impact of post-training interventions like Chain-of-Thought and Self-Consistency as language model capabilities continue to improve.
arXiv Detail & Related papers (2024-05-17T17:49:44Z) - Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models [100.53662473219806]
Diffusion-of-Thought (DoT) is a novel approach that integrates diffusion models with Chain-of-Thought.
DoT allows reasoning steps to diffuse over time through a diffusion language model.
Our results demonstrate the effectiveness of DoT in multi-digit multiplication, logic, and grade school math problems.
arXiv Detail & Related papers (2024-02-12T16:23:28Z) - A Survey of Diffusion Models in Natural Language Processing [11.233768932957771]
Diffusion models capture the diffusion of information or signals across a network or manifold.
This paper discusses the different formulations of diffusion models used in NLP, their strengths and limitations, and their applications.
arXiv Detail & Related papers (2023-05-24T03:25:32Z) - DiffusionBERT: Improving Generative Masked Language Models with
Diffusion Models [81.84866217721361]
DiffusionBERT is a new generative masked language model based on discrete diffusion models.
We propose a new noise schedule for the forward diffusion process that controls the degree of noise added at each step.
Experiments on unconditional text generation demonstrate that DiffusionBERT achieves significant improvement over existing diffusion models for text.
arXiv Detail & Related papers (2022-11-28T03:25:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.