Related papers: Beyond Surface Reasoning: Unveiling the True Long Chain-of-Thought Capacity of Diffusion Large Language Models

Beyond Surface Reasoning: Unveiling the True Long Chain-of-Thought Capacity of Diffusion Large Language Models

URL: http://arxiv.org/abs/2510.09544v1
Date: Fri, 10 Oct 2025 16:58:14 GMT
Title: Beyond Surface Reasoning: Unveiling the True Long Chain-of-Thought Capacity of Diffusion Large Language Models
Authors: Qiguang Chen, Hanjing Li, Libo Qin, Dengyun Peng, Jinhao Liu, Jiangyi Wang, Chengyue Wu, Xie Chen, Yantao Du, Wanxiang Che,
Abstract summary: parallel decoding, which enables simultaneous token updates, conflicts with the causal order often required for rigorous reasoning.<n> Behavioral analyses in both simple and complex reasoning tasks show thatDLLMs exhibit genuine parallelism only for directly decidable outputs.<n>We propose several practical mitigations, parallel-oriented prompting, diffusion early stopping, and parallel scaling, to reduce PSC-induced ineffectiveness and inefficiencies.
Score: 54.81955614221652
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Recently, Diffusion Large Language Models (DLLMs) have offered high throughput and effective sequential reasoning, making them a competitive alternative to autoregressive LLMs (ALLMs). However, parallel decoding, which enables simultaneous token updates, conflicts with the causal order often required for rigorous reasoning. We first identify this conflict as the core Parallel-Sequential Contradiction (PSC). Behavioral analyses in both simple and complex reasoning tasks show that DLLMs exhibit genuine parallelism only for directly decidable outputs. As task difficulty increases, they revert to autoregressive-like behavior, a limitation exacerbated by autoregressive prompting, which nearly doubles the number of decoding steps with remasking without improving quality. Moreover, PSC restricts DLLMs' self-reflection, reasoning depth, and exploratory breadth. To further characterize PSC, we introduce three scaling dimensions for DLLMs: parallel, diffusion, and sequential. Empirically, while parallel scaling yields consistent improvements, diffusion and sequential scaling are constrained by PSC. Based on these findings, we propose several practical mitigations, parallel-oriented prompting, diffusion early stopping, and parallel scaling, to reduce PSC-induced ineffectiveness and inefficiencies.

Related papers

Beyond Scattered Acceptance: Fast and Coherent Inference for DLMs via Longest Stable Prefixes [10.877713536966601]
Longestahead Prefix (LSP) scheduler is a training-free and model-agnostic inference paradigm based on monolithic prefix absorption.<n>LSP evaluates token stability via a single forward pass, dynamically identifies a contiguous left-aligned block of stable predictions.<n>It snaps its boundary to natural linguistic or structural acceptances before an atomic commitment.
arXiv Detail & Related papers (2026-03-05T18:25:26Z)
Causal Autoregressive Diffusion Language Model [70.7353007255797]
CARD reformulates the diffusion process within a strictly causal attention mask, enabling dense, per-token supervision in a single forward pass.<n>Our results demonstrate that CARD achieves ARM-level data efficiency while unlocking the latency benefits of parallel generation.
arXiv Detail & Related papers (2026-01-29T17:38:29Z)
Parallel Latent Reasoning for Sequential Recommendation [23.624137982116867]
We propose PLR, a novel framework for exploring multiple diverse reasoning trajectories simultaneously.<n>PLR constructs parallel reasoning streams through learnable trigger tokens in continuous latent space.<n>Experiments on three real-world datasets demonstrate that PLR substantially outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2026-01-06T16:25:48Z)
WeDLM: Reconciling Diffusion Language Models with Standard Causal Attention for Fast Inference [44.87788417755154]
We propose WeDLM, a diffusion decoding framework built entirely on standard causal attention.<n>We show that WeDLM preserves the quality of strong AR backbones while delivering substantial speedups.
arXiv Detail & Related papers (2025-12-28T01:25:48Z)
ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs [31.387806058620683]
diffusion LLMs have attracted growing interest for their potential to dramatically accelerate inference through parallel decoding.<n>Existing works largely overlook these inherent challenges, and evaluations on standard benchmarks are not sufficient to capture the quality degradation caused by parallel decoding.<n>We propose ParallelBench, the first benchmark specifically designed for dLLMs, featuring realistic tasks that are trivial for humans and autoregressive LLMs yet exceptionally challenging for dLLMs under parallel decoding.<n>Our findings underscore the pressing need for innovative decoding methods that can overcome the current speed-quality trade-off.
arXiv Detail & Related papers (2025-10-06T12:41:31Z)
Beyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language Models [82.87985794856803]
Large Language Models (LLMs) have achieved state-of-the-art performance on a broad range of Natural Language Processing (NLP) tasks.<n>Recently, Diffusion Language Models (DLMs) have emerged as a promising alternative architecture.
arXiv Detail & Related papers (2025-10-05T10:50:52Z)
Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language Models [8.407364705777587]
Diffusion Large Language Models (DLLMs) have emerged as a new paradigm of language modeling beyond autoregressive prediction.<n>Free Draft-and-Verification (Freedave) is a novel fast sampling algorithm tailored forDLLMs.
arXiv Detail & Related papers (2025-09-30T21:28:04Z)
ASPD: Unlocking Adaptive Serial-Parallel Decoding by Exploring Intrinsic Parallelism in LLMs [34.477777651648914]
Large language models (LLMs) pose significant inference latency challenges due to their autoregressive decoding paradigm.<n>We propose an Adaptive Serial-Parallel Decoding (ASPD) which addresses two core challenges: automated construction of parallelizable data and efficient parallel decoding mechanism.<n>Our framework sets a groundbreaking benchmark for efficient LLM parallel inference, paving the way for its deployment in latency-sensitive applications such as AI-powered customer service bots and answer retrieval engines.
arXiv Detail & Related papers (2025-08-12T12:35:55Z)
Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models [74.15250326312179]
Diffusion Large Language Models offer efficient parallel generation and capable global modeling.<n>The dominant application ofDLLMs is hindered by the need for a statically predefined generation length.<n>We introduce DAEDAL, a novel training-free denoising strategy that enables Dynamic Adaptive Length Expansion.
arXiv Detail & Related papers (2025-08-01T17:56:07Z)
Accelerating Diffusion LLMs via Adaptive Parallel Decoding [50.9948753314669]
We introduce adaptive parallel decoding (APD), a novel method that dynamically adjusts the number of tokens sampled in parallel.<n>APD provides markedly higher throughput with minimal quality degradations on downstream benchmarks.
arXiv Detail & Related papers (2025-05-31T06:10:10Z)
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding [51.711605076319216]
Diffusion-based large language models (Diffusion LLMs) have shown promise for non-autoregressive text generation with parallel decoding capabilities.<n>We introduce a novel block-wise approximate KV Cache mechanism tailored for bidirectional diffusion models, enabling cache reuse with negligible performance drop.<n>We propose a confidence-aware parallel decoding strategy that selectively decodes tokens exceeding a confidence threshold, mitigating dependency violations and maintaining generation quality.
arXiv Detail & Related papers (2025-05-28T17:39:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.