Related papers: TESS 2: A Large-Scale Generalist Diffusion Language Model

TESS 2: A Large-Scale Generalist Diffusion Language Model

URL: http://arxiv.org/abs/2502.13917v1
Date: Wed, 19 Feb 2025 17:50:31 GMT
Title: TESS 2: A Large-Scale Generalist Diffusion Language Model
Authors: Jaesung Tae, Hamish Ivison, Sachin Kumar, Arman Cohan,
Abstract summary: TESS 2 is an instruction-following diffusion language model that outperforms contemporary instruction-tuned diffusion models.<n>We find that adaptation training as well as the choice of the base model is crucial for training good instruction-following diffusion models.<n>We propose reward guidance, a novel and modular inference-time guidance procedure to align model outputs without needing to train the underlying model.
Score: 24.91689676432666
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce TESS 2, a general instruction-following diffusion language model that outperforms contemporary instruction-tuned diffusion models, as well as matches and sometimes exceeds strong autoregressive (AR) models. We train TESS 2 by first adapting a strong AR model via continued pretraining with the usual cross-entropy as diffusion loss, and then performing further instruction tuning. We find that adaptation training as well as the choice of the base model is crucial for training good instruction-following diffusion models. We further propose reward guidance, a novel and modular inference-time guidance procedure to align model outputs without needing to train the underlying model. Finally, we show that TESS 2 further improves with increased inference-time compute, highlighting the utility of diffusion LMs in having fine-grained controllability over the amount of compute used at inference time. Code and models are available at https://github.com/hamishivi/tess-2.

Related papers

Scaling Diffusion Language Models via Adaptation from Autoregressive Models [105.70889434492143]
Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling. We show that we can convert AR models ranging from 127M to 7B parameters into diffusion models DiffuGPT and DiffuLLaMA, using less than 200B tokens for training. Our experimental results reveal that these models outperform earlier DLMs and are competitive with their AR counterparts.
arXiv Detail & Related papers (2024-10-23T14:04:22Z)
Simplified and Generalized Masked Diffusion for Discrete Data [47.711583631408715]
Masked (or absorbing) diffusion is actively explored as an alternative to autoregressive models for generative modeling of discrete data. In this work, we aim to provide a simple and general framework that unlocks the full potential of masked diffusion models.
arXiv Detail & Related papers (2024-06-06T17:59:10Z)
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI) In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion) Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z)
Adaptive Training Meets Progressive Scaling: Elevating Efficiency in Diffusion Models [52.1809084559048]
We propose a novel two-stage divide-and-conquer training strategy termed TDC Training.<n>It groups timesteps based on task similarity and difficulty, assigning highly customized denoising models to each group, thereby enhancing the performance of diffusion models.<n>While two-stage training avoids the need to train each model separately, the total training cost is even lower than training a single unified denoising model.
arXiv Detail & Related papers (2023-12-20T03:32:58Z)
Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining. We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z)
Diff-Instruct: A Universal Approach for Transferring Knowledge From Pre-trained Diffusion Models [77.83923746319498]
We propose a framework called Diff-Instruct to instruct the training of arbitrary generative models. We show that Diff-Instruct results in state-of-the-art single-step diffusion-based models. Experiments on refining GAN models show that the Diff-Instruct can consistently improve the pre-trained generators of GAN models.
arXiv Detail & Related papers (2023-05-29T04:22:57Z)
David helps Goliath: Inference-Time Collaboration Between Small Specialized and Large General Diffusion LMs [49.822063966687175]
Diffusion-based language models are emerging as a promising alternative to autoregressive LMs. We propose methods to scale a recently proposed diffusion model SSD-LM from 0.4B to 13B parameters. We show that SSD-2 facilitates novel ensembles with 100x smaller models that can be customized and deployed by individual users.
arXiv Detail & Related papers (2023-05-24T06:22:14Z)
DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models [81.84866217721361]
DiffusionBERT is a new generative masked language model based on discrete diffusion models. We propose a new noise schedule for the forward diffusion process that controls the degree of noise added at each step. Experiments on unconditional text generation demonstrate that DiffusionBERT achieves significant improvement over existing diffusion models for text.
arXiv Detail & Related papers (2022-11-28T03:25:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.