Encoder-Decoder Diffusion Language Models for Efficient Training and Inference
- URL: http://arxiv.org/abs/2510.22852v1
- Date: Sun, 26 Oct 2025 22:05:22 GMT
- Title: Encoder-Decoder Diffusion Language Models for Efficient Training and Inference
- Authors: Marianne Arriola, Yair Schiff, Hao Phung, Aaron Gokaslan, Volodymyr Kuleshov,
- Abstract summary: We introduce a framework for Efficient Diffusion-Decoder (E2D2) consisting of an architecture with specialized training and sampling algorithms.<n>E2D2 achieves superior trade-offs between generation quality and inference on summarization, translation, and mathematical reasoning tasks.
- Score: 27.027494484311948
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Discrete diffusion models enable parallel token sampling for faster inference than autoregressive approaches. However, prior diffusion models use a decoder-only architecture, which requires sampling algorithms that invoke the full network at every denoising step and incur high computational cost. Our key insight is that discrete diffusion models perform two types of computation: 1) representing clean tokens and 2) denoising corrupted tokens, which enables us to use separate modules for each task. We propose an encoder-decoder architecture to accelerate discrete diffusion inference, which relies on an encoder to represent clean tokens and a lightweight decoder to iteratively refine a noised sequence. We also show that this architecture enables faster training of block diffusion models, which partition sequences into blocks for better quality and are commonly used in diffusion language model inference. We introduce a framework for Efficient Encoder-Decoder Diffusion (E2D2), consisting of an architecture with specialized training and sampling algorithms, and we show that E2D2 achieves superior trade-offs between generation quality and inference throughput on summarization, translation, and mathematical reasoning tasks. We provide the code, model weights, and blog post on the project page: https://m-arriola.com/e2d2
Related papers
- Streaming-dLLM: Accelerating Diffusion LLMs via Suffix Pruning and Dynamic Decoding [36.74241893088594]
Diffusion Large Language Models (dLLMs) offer a compelling paradigm for natural language generation.<n>Recent works have accelerated inference via KV cache reuse or decoding, but overlook the intrinsic inefficiencies within the block-wise diffusion process.<n>We propose Streaming-dLLM, a training-free framework that streamlines inference across both spatial and temporal dimensions.
arXiv Detail & Related papers (2026-01-25T17:36:04Z) - DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation [68.19756761027351]
Diffusion large language models (dLLMs) are compelling alternatives to autoregressive (AR) models.<n>We investigate their denoising processes and reinforcement learning methods.<n>Our work provides deeper insight into the machinery of dLLM generation and offers an effective, diffusion-native RL training framework.
arXiv Detail & Related papers (2025-06-25T17:35:47Z) - One-Way Ticket:Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models [65.96186414865747]
Text-to-Image (T2I) diffusion models face a trade-off between inference speed and image quality.<n>We introduce the first Time-independent Unified TiUE for the student model UNet architecture.<n>Using a one-pass scheme, TiUE shares encoder features across multiple decoder time steps, enabling parallel sampling.
arXiv Detail & Related papers (2025-05-28T04:23:22Z) - FlashDLM: Accelerating Diffusion Language Model Inference via Efficient KV Caching and Guided Diffusion [22.207275433870937]
Diffusion language models offer parallel token generation and inherent bidirectionality.<n>State-of-the-art diffusion models (e.g., Dream 7B, LLaDA 8B) suffer from slow inference.<n>We introduce Guided Diffusion, a training-free method that uses a lightweight pretrained autoregressive model to supervise token unmasking.
arXiv Detail & Related papers (2025-05-27T17:39:39Z) - I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models [50.34197948438868]
ThinkDiff is an alignment paradigm that empowers text-to-image diffusion models with multimodal in-context understanding and reasoning capabilities.<n>We show that ThinkDiff significantly improves accuracy from 19.2% to 46.3% on the challenging CoBSAT benchmark for multimodal in-context reasoning generation.<n>We also demonstrate exceptional performance in composing multiple images and texts into logically coherent images.
arXiv Detail & Related papers (2025-02-12T05:30:08Z) - Geometry-Preserving Encoder/Decoder in Latent Generative Models [15.766401356353084]
We introduce a novel encoder/decoder framework with theoretical properties distinct from those of the VAE.<n>We demonstrate the significant advantages of this geometry-preserving encoder in the training process of both the encoder and decoder.
arXiv Detail & Related papers (2025-01-16T23:14:34Z) - Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition [20.052245837954175]
We propose an efficient and accurate streaming speech recognition model based on the FastConformer architecture.
We introduce an activation caching mechanism to enable the non-autoregressive encoder to operate autoregressively during inference.
A hybrid CTC/RNNT architecture which utilizes a shared encoder with both a CTC and RNNT decoder to boost the accuracy and save computation.
arXiv Detail & Related papers (2023-12-27T21:04:26Z) - Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference [95.42299246592756]
We study the UNet encoder and empirically analyze the encoder features.
We find that encoder features change minimally, whereas the decoder features exhibit substantial variations across different time-steps.
We validate our approach on other tasks: text-to-video, personalized generation and reference-guided generation.
arXiv Detail & Related papers (2023-12-15T08:46:43Z) - Complexity Matters: Rethinking the Latent Space for Generative Modeling [65.64763873078114]
In generative modeling, numerous successful approaches leverage a low-dimensional latent space, e.g., Stable Diffusion.
In this study, we aim to shed light on this under-explored topic by rethinking the latent space from the perspective of model complexity.
arXiv Detail & Related papers (2023-07-17T07:12:29Z) - ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking
Inference [70.36083572306839]
This paper proposes a new training and inference paradigm for re-ranking.
We finetune a pretrained encoder-decoder model using in the form of document to query generation.
We show that this encoder-decoder architecture can be decomposed into a decoder-only language model during inference.
arXiv Detail & Related papers (2022-04-25T06:26:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.