Transfer Learning for Text Diffusion Models
- URL: http://arxiv.org/abs/2401.17181v1
- Date: Tue, 30 Jan 2024 17:11:56 GMT
- Title: Transfer Learning for Text Diffusion Models
- Authors: Kehang Han, Kathleen Kenealy, Aditya Barua, Noah Fiedel, Noah Constant
- Abstract summary: We explore the potential for text diffusion to replace autoregressive (AR) decoding for the training and deployment of large language models (LLMs)
We use a lightweight adaptation procedure we call AR2Diff'' to transform pretrained AR models into text diffusion models.
- Score: 16.97230119564891
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this report, we explore the potential for text diffusion to replace
autoregressive (AR) decoding for the training and deployment of large language
models (LLMs). We are particularly interested to see whether pretrained AR
models can be transformed into text diffusion models through a lightweight
adaptation procedure we call ``AR2Diff''. We begin by establishing a strong
baseline setup for training text diffusion models. Comparing across multiple
architectures and pretraining objectives, we find that training a decoder-only
model with a prefix LM objective is best or near-best across several tasks.
Building on this finding, we test various transfer learning setups for text
diffusion models. On machine translation, we find that text diffusion
underperforms the standard AR approach. However, on code synthesis and
extractive QA, we find diffusion models trained from scratch outperform AR
models in many cases. We also observe quality gains from AR2Diff -- adapting AR
models to use diffusion decoding. These results are promising given that text
diffusion is relatively underexplored and can be significantly faster than AR
decoding for long text generation.
Related papers
- Scaling Diffusion Language Models via Adaptation from Autoregressive Models [105.70889434492143]
Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling.
We show that we can convert AR models ranging from 127M to 7B parameters into diffusion models DiffuGPT and DiffuLLaMA, using less than 200B tokens for training.
Our experimental results reveal that these models outperform earlier DLMs and are competitive with their AR counterparts.
arXiv Detail & Related papers (2024-10-23T14:04:22Z) - Simple and Effective Masked Diffusion Language Models [48.68198363304619]
We show that simple masked discrete diffusion is more performant than previously thought.
Our objective has a simple form -- it is a mixture of classical masked language modeling losses.
On language modeling benchmarks, a range of masked diffusion models trained with modern engineering practices achieves a new state-of-the-art.
arXiv Detail & Related papers (2024-06-11T17:51:40Z) - LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation? [10.72249123249003]
We revisit diffusion models, highlighting their capacity for holistic context modeling and parallel decoding.
We introduce a novel architecture, LaDiC, which utilizes a split BERT to create a dedicated latent space for captions.
LaDiC achieves state-of-the-art performance for diffusion-based methods on the MS dataset with 38.2 BLEU@4 and 126.2 CIDEr.
arXiv Detail & Related papers (2024-04-16T17:47:16Z) - LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image
Diffusion Models with Large Language Models [62.75006608940132]
This work proposes to enhance prompt understanding capabilities in text-to-image diffusion models.
Our method leverages a pretrained large language model for grounded generation in a novel two-stage process.
Our method significantly outperforms the base diffusion model and several strong baselines in accurately generating images.
arXiv Detail & Related papers (2023-05-23T03:59:06Z) - Diffusion Models for Non-autoregressive Text Generation: A Survey [94.4634088113513]
Non-autoregressive (NAR) text generation has attracted much attention in the field of natural language processing.
Recently, diffusion models have been introduced into NAR text generation, showing an improved text generation quality.
arXiv Detail & Related papers (2023-03-12T05:11:09Z) - Unleashing Text-to-Image Diffusion Models for Visual Perception [84.41514649568094]
VPD (Visual Perception with a pre-trained diffusion model) is a new framework that exploits the semantic information of a pre-trained text-to-image diffusion model in visual perception tasks.
We show that VPD can be faster adapted to downstream visual perception tasks using the proposed VPD.
arXiv Detail & Related papers (2023-03-03T18:59:47Z) - DiffusionBERT: Improving Generative Masked Language Models with
Diffusion Models [81.84866217721361]
DiffusionBERT is a new generative masked language model based on discrete diffusion models.
We propose a new noise schedule for the forward diffusion process that controls the degree of noise added at each step.
Experiments on unconditional text generation demonstrate that DiffusionBERT achieves significant improvement over existing diffusion models for text.
arXiv Detail & Related papers (2022-11-28T03:25:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.