Related papers: Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models

Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models

URL: http://arxiv.org/abs/2508.09138v2
Date: Mon, 22 Sep 2025 07:56:08 GMT
Title: Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models
Authors: Wen Wang, Bozhen Fang, Chenchen Jing, Yongliang Shen, Yangyi Shen, Qiuyu Wang, Hao Ouyang, Hao Chen, Chunhua Shen,
Abstract summary: Diffusion large language models (dLLMs) generate text through iterative denoising.<n>Current decoding strategies discard rich intermediate predictions in favor of the final output.<n>We introduce two complementary methods that exploit temporal consistency.
Score: 57.474294329887236
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion large language models (dLLMs) generate text through iterative denoising, yet current decoding strategies discard rich intermediate predictions in favor of the final output. Our work here reveals a critical phenomenon, temporal oscillation, where correct answers often emerge in the middle process, but are overwritten in later denoising steps. To address this issue, we introduce two complementary methods that exploit temporal consistency: 1) Temporal Self-Consistency Voting, a training-free, test-time decoding strategy that aggregates predictions across denoising steps to select the most consistent output; and 2) a post-training method termed Temporal Consistency Reinforcement, which uses Temporal Semantic Entropy (TSE), a measure of semantic stability across intermediate predictions, as a reward signal to encourage stable generations. Empirical results across multiple benchmarks demonstrate the effectiveness of our approach. Using the negative TSE reward alone, we observe a remarkable average improvement of 24.7% on the Countdown dataset over an existing dLLM. Combined with the accuracy reward, we achieve absolute gains of 2.0% on GSM8K, 4.3% on MATH500, 6.6% on SVAMP, and 25.3% on Countdown, respectively. Our findings underscore the untapped potential of temporal dynamics in dLLMs and offer two simple yet effective tools to harness them.

Related papers

Prism: Efficient Test-Time Scaling via Hierarchical Search and Self-Verification for Discrete Diffusion Language Models [96.0074341403456]
Inference-time compute has re-emerged as a practical way to improve LLM reasoning.<n>Most test-time scaling (TTS) algorithms rely on autoregressive decoding.<n>We propose Prism, an efficient TTS framework for dLLMs.
arXiv Detail & Related papers (2026-02-02T09:14:51Z)
EDIT: Early Diffusion Inference Termination for dLLMs Based on Dynamics of Training Gradients [6.736735746633275]
Diffusion-based large language models (dLLMs) refine token generations through iterative denoising, but answers often stabilize before all steps complete.<n>We propose EDIT, an inference-time criterion that adaptively stops denoising once sufficient reasoning stability relative to training-time reasoning is detected.
arXiv Detail & Related papers (2025-11-29T23:47:47Z)
Seer Self-Consistency: Advance Budget Estimation for Adaptive Test-Time Scaling [55.026048429595384]
Test-time scaling improves the inference performance of Large Language Models (LLMs) but also incurs substantial computational costs.<n>We propose SeerSC, a dynamic self-consistency framework that simultaneously improves token efficiency and latency.
arXiv Detail & Related papers (2025-11-12T13:57:43Z)
SynCast: Synergizing Contradictions in Precipitation Nowcasting via Diffusion Sequential Preference Optimization [62.958457694151384]
We introduce preference optimization into precipitation nowcasting for the first time, motivated by the success of reinforcement learning from human feedback in large language models.<n>In the first stage, the framework focuses on reducing FAR, training the model to effectively suppress false alarms.
arXiv Detail & Related papers (2025-10-22T16:11:22Z)
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs [67.55973229034319]
This paper introduces TempSamp-R1, a new reinforcement fine-tuning framework designed to improve the effectiveness of adapting multimodal large language models (MLLMs) to video temporal grounding tasks.<n>We show that TempSamp-R1 outperforms GRPO-based baselines, establishing new state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2025-09-22T17:30:15Z)
TLCCSP: A Scalable Framework for Enhancing Time Series Forecasting with Time-Lagged Cross-Correlations [14.152868750710203]
Time series forecasting is critical across various domains, such as weather, finance and real estate forecasting.<n>We propose the Time-Lagged Cross-Correlations-based Sequence Prediction framework (TLCCSP), which integrates time-lagged cross-correlated sequences.<n> Experimental results on weather, finance and real estate time series datasets demonstrate the effectiveness of our framework.
arXiv Detail & Related papers (2025-08-09T15:29:14Z)
Efficient Temporal Tokenization for Mobility Prediction with Large Language Models [7.704947355789259]
RHYTHM is a framework that leverages large language models (LLMs) as trajectory predictors and reasoners.<n> Token representations are enriched with prompt embeddings via a frozen LLM, enhancing the model's ability to capture interdependencies.<n> Evaluation on three real-world datasets demonstrates a 2.4% improvement in accuracy, 5.0% increase on weekends, and 24.6% reduction in training time compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-07-18T15:31:16Z)
Test-Time Scaling of Diffusion Models via Noise Trajectory Search [7.243632426715941]
We introduce an $epsilon$-greedy search algorithm that globally explores at extreme timesteps and locally exploits during the intermediate steps where de-mixing occurs.<n>Experiments on EDM and Stable Diffusion reveal state-of-the-art scores for class-conditioned/text-to-image generation.
arXiv Detail & Related papers (2025-05-24T19:13:29Z)
TimeDART: A Diffusion Autoregressive Transformer for Self-Supervised Time Series Representation [6.047856576139978]
We propose textbfTimeDART, a novel self-supervised time series pre-training framework.<n>TimeDART unifies two powerful generative paradigms to learn more transferable representations.<n>We conduct extensive experiments on public datasets for time series forecasting and classification.
arXiv Detail & Related papers (2024-10-08T06:08:33Z)
Generative Time Series Forecasting with Diffusion, Denoise, and Disentanglement [51.55157852647306]
Time series forecasting has been a widely explored task of great importance in many applications. It is common that real-world time series data are recorded in a short time period, which results in a big gap between the deep model and the limited and noisy time series. We propose to address the time series forecasting problem with generative modeling and propose a bidirectional variational auto-encoder equipped with diffusion, denoise, and disentanglement.
arXiv Detail & Related papers (2023-01-08T12:20:46Z)
Voice2Series: Reprogramming Acoustic Models for Time Series Classification [65.94154001167608]
Voice2Series is a novel end-to-end approach that reprograms acoustic models for time series classification. We show that V2S either outperforms or is tied with state-of-the-art methods on 20 tasks, and improves their average accuracy by 1.84%.
arXiv Detail & Related papers (2021-06-17T07:59:15Z)
Conditioned Time-Dilated Convolutions for Sound Event Detection [20.883760606514937]
We present a novel algorithm for the conditioning of the time-dilated convolutions which functions similarly to language modelling. We employ the freely available TUT-SED Synthetic dataset, and we assess the performance of our method using the average per-frame $textF_1$ score and average per-frame error rate.
arXiv Detail & Related papers (2020-07-10T06:05:23Z)
Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition [66.47000813920619]
We propose a non-autoregressive end-to-end speech recognition system called LASO. Because of the non-autoregressive property, LASO predicts a textual token in the sequence without the dependence on other tokens. We conduct experiments on publicly available Chinese dataset AISHELL-1.
arXiv Detail & Related papers (2020-05-11T04:45:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.