Related papers: Diffusion Language Model Inference with Monte Carlo Tree Search

Diffusion Language Model Inference with Monte Carlo Tree Search

URL: http://arxiv.org/abs/2512.12168v1
Date: Sat, 13 Dec 2025 04:30:02 GMT
Title: Diffusion Language Model Inference with Monte Carlo Tree Search
Authors: Zheng Huang, Kiran Ramnath, Yueyan Chen, Aosong Feng, Sangmin Woo, Balasubramaniam Srinivasan, Zhichao Xu, Kang Zhou, Shuai Wang, Haibo Ding, Lin Lee Cheong,
Abstract summary: Diffusion language models (DLMs) have emerged as a compelling alternative to autoregressive generation.<n>We introduce MEDAL, a principled search mechanism for DLMs inference.<n>Across multiple benchmarks, MEDAL achieves up to 22.0% improvement over existing inference strategies.
Score: 22.7649405246503
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion language models (DLMs) have recently emerged as a compelling alternative to autoregressive generation, offering parallel generation and improved global coherence. During inference, DLMs generate text by iteratively denoising masked sequences in parallel; however, determining which positions to unmask and which tokens to commit forms a large combinatorial search problem. Existing inference methods approximate this search using heuristics, which often yield suboptimal decoding paths; other approaches instead rely on additional training to guide token selection. To introduce a principled search mechanism for DLMs inference, we introduce MEDAL, a framework that integrates Monte Carlo Tree SEarch initialization for Diffusion LAnguage Model inference. We employ Monte Carlo Tree Search at the initialization stage to explore promising unmasking trajectories, providing a robust starting point for subsequent refinement. This integration is enabled by restricting the search space to high-confidence actions and prioritizing token choices that improve model confidence over remaining masked positions. Across multiple benchmarks, MEDAL achieves up to 22.0% improvement over existing inference strategies, establishing a new paradigm for search-based inference in diffusion language models.

Related papers

Search or Accelerate: Confidence-Switched Position Beam Search for Diffusion Language Models [24.78455014605002]
Diffusion Language Models generate text by iteratively denoising a masked sequence.<n>Standard decoding follows a greedy rule: unmask the most confident positions.<n>We present SOAR, a training-free decoding algorithm that adapts its behavior to the model's uncertainty.
arXiv Detail & Related papers (2026-02-11T15:41:09Z)
UnMaskFork: Test-Time Scaling for Masked Diffusion via Deterministic Action Branching [7.499410407885288]
UnMaskFork (UMF) is a framework that formulates the unmasking trajectory as a search tree and employs Monte Carlo Tree Search to optimize the generation path.<n>UMF consistently outperforms existing test-time scaling baselines on complex coding benchmarks.
arXiv Detail & Related papers (2026-02-04T09:13:08Z)
Self-Rewarding Sequential Monte Carlo for Masked Diffusion Language Models [58.946955321428845]
This work presents self-rewarding sequential Monte Carlo (SMC)<n>Our algorithm stems from the observation that most existing MDLMs rely on a confidence-based sampling strategy.<n>We introduce the trajectory-level confidence as a self-rewarding signal for assigning particle importance weights.
arXiv Detail & Related papers (2026-02-02T09:21:45Z)
Prism: Efficient Test-Time Scaling via Hierarchical Search and Self-Verification for Discrete Diffusion Language Models [96.0074341403456]
Inference-time compute has re-emerged as a practical way to improve LLM reasoning.<n>Most test-time scaling (TTS) algorithms rely on autoregressive decoding.<n>We propose Prism, an efficient TTS framework for dLLMs.
arXiv Detail & Related papers (2026-02-02T09:14:51Z)
TSLM: Tree-Structured Language Modeling for Divergent Thinking [32.89058911018328]
We introduce Tree-Structured Language Modeling (TSLM), which uses special tokens to encode branching structure.<n>TSLM learns to internalize systematic exploration without redundant recomputation of shared prefixes.<n>Results suggest a new paradigm of inference-time scaling for robust reasoning.
arXiv Detail & Related papers (2026-01-30T08:04:59Z)
WeMMU: Enhanced Bridging of Vision-Language Models and Diffusion Models via Noisy Query Tokens [69.97021957331326]
We propose Noisy Query Tokens, which learn a distributed representation space between the VLM and Diffusion Model via end-to-end optimization.<n>We also introduce a VAE branch with linear projection to recover fine-grained image details.
arXiv Detail & Related papers (2025-12-02T09:02:20Z)
Tree Reward-Aligned Search for TReASURe in Masked Diffusion Language Models [13.433506313486701]
Tree search has emerged as a powerful framework for aligning generative models with task-specific rewards at test time.<n>We propose TReASURe, a tree-search test-time alignment method that addresses these issues.<n>TReASURe achieves state-of-the-art results on perplexity, linguistic acceptability, and control of sentiment and toxicity.
arXiv Detail & Related papers (2025-09-27T06:22:45Z)
Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling [70.8832906871441]
We study how to steer generation toward desired rewards without retraining the models.<n>Prior methods typically resample or filter within a single denoising trajectory, optimizing rewards step-by-step without trajectory-level refinement.<n>We introduce particle Gibbs sampling for diffusion language models (PG-DLM), a novel inference-time algorithm enabling trajectory-level refinement while preserving generation perplexity.
arXiv Detail & Related papers (2025-07-11T08:00:47Z)
Sparse Meets Dense: Unified Generative Recommendations with Cascaded Sparse-Dense Representations [22.48125906976824]
We introduce the Cascaded Organized Bi-Represented generAtive retrieval framework, which integrates sparse semantic IDs and dense vectors through a cascading process.<n>Our method alternates between generating these representations by first generating sparse IDs, which serve as conditions to aid in the generation of dense vectors.<n>During inference, COBRA employs a coarse-to-fine strategy, starting with sparse ID generation and refining them into dense vectors via the generative model.
arXiv Detail & Related papers (2025-03-04T10:00:05Z)
I-MCTS: Enhancing Agentic AutoML via Introspective Monte Carlo Tree Search [10.718560472954644]
Introspective Monte Carlo Tree Search (I-MCTS) is a novel approach that iteratively expands tree nodes through an introspective process.<n>We integrate a Large Language Model (LLM)-based value model to facilitate direct evaluation of each node's solution.<n>Our approach demonstrates a 6% absolute improvement in performance compared to the strong open-source AutoML agents.
arXiv Detail & Related papers (2025-02-20T16:19:09Z)
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [56.273799410256075]
The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path. The framework has been tested on general and advanced benchmarks, showing superior performance in terms of search efficiency and problem-solving capability.
arXiv Detail & Related papers (2024-10-03T18:12:29Z)
CART: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling [53.97609687516371]
Cross-modal retrieval aims to search for instances, which are semantically related to the query through the interaction of different modal data.<n>Traditional solutions utilize a single-tower or dual-tower framework to explicitly compute the score between queries and candidates.<n>We propose a generative cross-modal retrieval framework (CART) based on coarse-to-fine semantic modeling.
arXiv Detail & Related papers (2024-06-25T12:47:04Z)
Recommender Systems with Generative Retrieval [58.454606442670034]
We propose a novel generative retrieval approach, where the retrieval model autoregressively decodes the identifiers of the target candidates. To that end, we create semantically meaningful of codewords to serve as a Semantic ID for each item. We show that recommender systems trained with the proposed paradigm significantly outperform the current SOTA models on various datasets.
arXiv Detail & Related papers (2023-05-08T21:48:17Z)
Modeling Sequential Sentence Relation to Improve Cross-lingual Dense Retrieval [80.43859162884353]
We propose a multilingual multilingual language model called masked sentence model (MSM)<n>MSM consists of a sentence encoder to generate the sentence representations, and a document encoder applied to a sequence of sentence vectors from a document.<n>To train the model, we propose a masked sentence prediction task, which masks and predicts the sentence vector via a hierarchical contrastive loss with sampled negatives.
arXiv Detail & Related papers (2023-02-03T09:54:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.