Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model a Latent Reasoner
- URL: http://arxiv.org/abs/2510.03206v1
- Date: Fri, 03 Oct 2025 17:44:41 GMT
- Title: Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model a Latent Reasoner
- Authors: Cai Zhou, Chenxiao Yang, Yi Hu, Chenyu Wang, Chubin Zhang, Muhan Zhang, Lester Mackey, Tommi Jaakkola, Stephen Bates, Dinghuai Zhang,
- Abstract summary: We argue that diffusion language models do not necessarily need to be in the discrete space.<n>In particular, we prove that continuous diffusion models have stronger expressivity than discrete diffusions and looped transformers.<n>We propose Coevolutionary Continuous Diffusion (CCDD), which defines a joint multimodal diffusion process on the union of a continuous representation space and a discrete token space.
- Score: 66.86440230599656
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion language models, especially masked discrete diffusion models, have achieved great success recently. While there are some theoretical and primary empirical results showing the advantages of latent reasoning with looped transformers or continuous chain-of-thoughts, continuous diffusion models typically underperform their discrete counterparts. In this paper, we argue that diffusion language models do not necessarily need to be in the discrete space. In particular, we prove that continuous diffusion models have stronger expressivity than discrete diffusions and looped transformers. We attribute the contradiction between the theoretical expressiveness and empirical performance to their practical trainability: while continuous diffusion provides intermediate supervision that looped transformers lack, they introduce additional difficulty decoding tokens into the discrete token space from the continuous representation space. We therefore propose Coevolutionary Continuous Discrete Diffusion (CCDD), which defines a joint multimodal diffusion process on the union of a continuous representation space and a discrete token space, leveraging a single model to simultaneously denoise in the joint space. By combining two modalities, CCDD is expressive with rich semantics in the latent space, as well as good trainability and sample quality with the help of explicit discrete tokens. We also propose effective architectures and advanced training/sampling techniques for CCDD, which reveals strong empirical performance in extensive language modeling experiments on real-world tasks.
Related papers
- Bridging the Discrete-Continuous Gap: Unified Multimodal Generation via Coupled Manifold Discrete Absorbing Diffusion [60.186310080523135]
Bifurcation of generative modeling into autoregressive approaches for discrete data (text) and diffusion approaches for continuous data (images) hinders development of truly unified multimodal systems.<n>We propose textbfCoM-DAD, a novel probabilistic framework that reformulates multimodal generation as a hierarchical dual-process.<n>Our method demonstrates superior stability over standard masked modeling, establishing a new paradigm for scalable, unified text-image generation.
arXiv Detail & Related papers (2026-01-07T16:21:19Z) - On the Role of Discreteness in Diffusion LLMs [69.64854287505999]
We revisit the view of diffusion process and language modeling, and outline five properties that separate diffusion mechanics from language-specific requirements.<n>We identify two central issues: (i) uniform corruption does not respect how information is distributed across positions, and (ii) token-wise marginal training cannot capture multi-token dependencies during parallel decoding.<n>These observations motivate diffusion processes that align more closely with the structure of text, and encourage future work toward more coherent diffusion language models.
arXiv Detail & Related papers (2025-12-27T16:03:08Z) - Guided Transfer Learning for Discrete Diffusion Models [21.909689920217982]
We present Guided Transfer Learning for discrete diffusion models (GTL)<n>GTL enables sampling from a target distribution without modifying the pretrained denoiser.<n>We also present an efficient guided sampler that concentrates evaluations on planner-selected positions and top candidate tokens.
arXiv Detail & Related papers (2025-12-11T18:05:55Z) - Foundations of Diffusion Models in General State Spaces: A Self-Contained Introduction [54.95522167029998]
This article is a self-contained primer on diffusion over general state spaces.<n>We develop the discrete-time view (forward noising via Markov kernels and learned reverse dynamics) alongside its continuous-time limits.<n>A common variational treatment yields the ELBO that underpins standard training losses.
arXiv Detail & Related papers (2025-12-04T18:55:36Z) - Continuously Augmented Discrete Diffusion model for Categorical Generative Modeling [87.34677262370924]
Standard discrete diffusion models treat all unobserved states identically by mapping them to an absorbing [MASK] token.<n>This creates an 'information void' where semantic information that could be inferred from unmasked tokens is lost between denoising steps.<n>We introduce Continuously Augmented Discrete Diffusion, a framework that augments the discrete state space with a paired diffusion in a continuous latent space.
arXiv Detail & Related papers (2025-10-01T18:00:56Z) - Generalized Interpolating Discrete Diffusion [65.74168524007484]
Masked diffusion is a popular choice due to its simplicity and effectiveness.<n>We generalize a new family of general interpolating discrete diffusion (GIDD) which offers greater flexibility in the design of the noising processes.<n>Exploiting GIDD's flexibility, we explore a hybrid approach combining masking and uniform noise, leading to improved sample quality.
arXiv Detail & Related papers (2025-03-06T14:30:55Z) - Continuous Diffusion Model for Language Modeling [57.396578974401734]
Existing continuous diffusion models for discrete data have limited performance compared to discrete approaches.<n>We propose a continuous diffusion model for language modeling that incorporates the geometry of the underlying categorical distribution.
arXiv Detail & Related papers (2025-02-17T08:54:29Z) - G2D2: Gradient-Guided Discrete Diffusion for Inverse Problem Solving [83.56510119503267]
This paper presents a novel method for addressing linear inverse problems by leveraging generative models based on discrete diffusion as priors.<n>We employ a star-shaped noise process to mitigate the drawbacks of traditional discrete diffusion models with absorbing states.
arXiv Detail & Related papers (2024-10-09T06:18:25Z) - DiffusionDialog: A Diffusion Model for Diverse Dialog Generation with Latent Space [7.131920232495329]
In real-life conversations, the content is diverse, and there exists the one-to-many problem that requires diverse generation.
Previous studies attempted to introduce discrete or Gaussian-based continuous latent variables to address the one-to-many problem.
We propose DiffusionDialog, a novel approach to enhance the diversity of dialogue generation with the help of diffusion model.
arXiv Detail & Related papers (2024-04-10T05:56:46Z) - Convergence Analysis of Discrete Diffusion Model: Exact Implementation
through Uniformization [17.535229185525353]
We introduce an algorithm leveraging the uniformization of continuous Markov chains, implementing transitions on random time points.
Our results align with state-of-the-art achievements for diffusion models in $mathbbRd$ and further underscore the advantages of discrete diffusion models in comparison to the $mathbbRd$ setting.
arXiv Detail & Related papers (2024-02-12T22:26:52Z) - Continuous diffusion for categorical data [42.60475010640669]
We propose CDCD, a framework for modelling categorical data with diffusion models that are continuous both in time and input space.
We demonstrate its efficacy on several language modelling tasks.
arXiv Detail & Related papers (2022-11-28T06:08:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.