CtrlDiff: Boosting Large Diffusion Language Models with Dynamic Block Prediction and Controllable Generation
- URL: http://arxiv.org/abs/2505.14455v1
- Date: Tue, 20 May 2025 14:52:41 GMT
- Title: CtrlDiff: Boosting Large Diffusion Language Models with Dynamic Block Prediction and Controllable Generation
- Authors: Chihan Huang, Hao Tang,
- Abstract summary: Diffusion-based language models have emerged as a compelling alternative due to their powerful parallel generation capabilities and inherent editability.<n>We propose CtrlDiff, a dynamic and controllable semi-autoregressive framework that adaptively determines the size of each generation block based on local semantics.
- Score: 7.250878248686215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although autoregressive models have dominated language modeling in recent years, there has been a growing interest in exploring alternative paradigms to the conventional next-token prediction framework. Diffusion-based language models have emerged as a compelling alternative due to their powerful parallel generation capabilities and inherent editability. However, these models are often constrained by fixed-length generation. A promising direction is to combine the strengths of both paradigms, segmenting sequences into blocks, modeling autoregressive dependencies across blocks while leveraging discrete diffusion to estimate the conditional distribution within each block given the preceding context. Nevertheless, their practical application is often hindered by two key limitations: rigid fixed-length outputs and a lack of flexible control mechanisms. In this work, we address the critical limitations of fixed granularity and weak controllability in current large diffusion language models. We propose CtrlDiff, a dynamic and controllable semi-autoregressive framework that adaptively determines the size of each generation block based on local semantics using reinforcement learning. Furthermore, we introduce a classifier-guided control mechanism tailored to discrete diffusion, which significantly reduces computational overhead while facilitating efficient post-hoc conditioning without retraining. Extensive experiments demonstrate that CtrlDiff sets a new standard among hybrid diffusion models, narrows the performance gap to state-of-the-art autoregressive approaches, and enables effective conditional text generation across diverse tasks.
Related papers
- Unveiling the Potential of Diffusion Large Language Model in Controllable Generation [11.181783720439563]
Diffusion models, originally developed for image generation, have emerged as a promising alternative to autoregressive large language models (LLMs)<n>We present a theoretical analysis comparing autoregressive and masked diffusion LLMs (dLLMs)<n>We propose textbfSelf-adaptivetexttextbfSchema textbfScaf, a novel framework that enables dLLMs to generate structured outputs while maintaining semantic fidelity and accelerating inference.
arXiv Detail & Related papers (2025-07-06T18:41:34Z) - Constrained Language Generation with Discrete Diffusion Models [61.81569616239755]
We present Constrained Discrete Diffusion (CDD), a novel method for enforcing constraints on natural language by integrating discrete diffusion models with differentiable optimization.<n>We show how this technique can be applied to satisfy a variety of natural language constraints, including (i) toxicity mitigation by preventing harmful content from emerging, (ii) character and sequence level lexical constraints, and (iii) novel molecule sequence generation with specific property adherence.
arXiv Detail & Related papers (2025-03-12T19:48:12Z) - Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models [15.853201399662344]
Diffusion language models offer unique benefits over autoregressive models.<n>They lag in likelihood modeling and are limited to fixed-length generation.<n>We introduce a class of block diffusion language models that interpolate between discrete denoising diffusion and autoregressive models.
arXiv Detail & Related papers (2025-03-12T17:43:40Z) - Generalized Interpolating Discrete Diffusion [65.74168524007484]
Masked diffusion is a popular choice due to its simplicity and effectiveness.<n>We derive the theoretical backbone of a family of general interpolating discrete diffusion processes.<n>Exploiting GIDD's flexibility, we explore a hybrid approach combining masking and uniform noise.
arXiv Detail & Related papers (2025-03-06T14:30:55Z) - Diffusion Predictive Control with Constraints [51.91057765703533]
Diffusion predictive control with constraints (DPCC)<n>An algorithm for diffusion-based control with explicit state and action constraints that can deviate from those in the training data.<n>We show through simulations of a robot manipulator that DPCC outperforms existing methods in satisfying novel test-time constraints while maintaining performance on the learned control task.
arXiv Detail & Related papers (2024-12-12T15:10:22Z) - ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer [95.80384464922147]
ACDiT is a blockwise Conditional Diffusion Transformer.<n>It offers a flexible between token-wise autoregression and full-sequence diffusion.<n>We show that ACDiT performs best among all autoregressive baselines on image and video generation tasks.
arXiv Detail & Related papers (2024-12-10T18:13:20Z) - Energy-Based Diffusion Language Models for Text Generation [126.23425882687195]
Energy-based Diffusion Language Model (EDLM) is an energy-based model operating at the full sequence level for each diffusion step.<n>Our framework offers a 1.3$times$ sampling speedup over existing diffusion models.
arXiv Detail & Related papers (2024-10-28T17:25:56Z) - RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance [34.893261410589396]
Block-wise generation can be a promising alternative for designing compact-sized deep generative models.
We propose a retrieval-augmented generation (RAG) approach to condition the training and generation stages of a block-wise denoising diffusion model.
Our conditioning schemes ensure coherence across the different blocks during training and, consequently, during generation.
arXiv Detail & Related papers (2024-08-30T08:26:55Z) - PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model [37.2192243883707]
We propose PLANNER, a model that combines latent semantic diffusion with autoregressive generation to generate fluent text.
Results on semantic generation, text completion and summarization show its effectiveness in generating high-quality long-form text.
arXiv Detail & Related papers (2023-06-05T01:36:39Z) - TESS: Text-to-Text Self-Conditioned Simplex Diffusion [56.881170312435444]
Text-to-text Self-conditioned Simplex Diffusion employs a new form of self-conditioning, and applies the diffusion process on the logit simplex space rather than the learned embedding space.
We demonstrate that TESS outperforms state-of-the-art non-autoregressive models, requires fewer diffusion steps with minimal drop in performance, and is competitive with pretrained autoregressive sequence-to-sequence models.
arXiv Detail & Related papers (2023-05-15T06:33:45Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.