Diffusion Language Model Inference with Monte Carlo Tree Search
- URL: http://arxiv.org/abs/2512.12168v1
- Date: Sat, 13 Dec 2025 04:30:02 GMT
- Title: Diffusion Language Model Inference with Monte Carlo Tree Search
- Authors: Zheng Huang, Kiran Ramnath, Yueyan Chen, Aosong Feng, Sangmin Woo, Balasubramaniam Srinivasan, Zhichao Xu, Kang Zhou, Shuai Wang, Haibo Ding, Lin Lee Cheong,
- Abstract summary: Diffusion language models (DLMs) have emerged as a compelling alternative to autoregressive generation.<n>We introduce MEDAL, a principled search mechanism for DLMs inference.<n>Across multiple benchmarks, MEDAL achieves up to 22.0% improvement over existing inference strategies.
- Score: 22.7649405246503
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion language models (DLMs) have recently emerged as a compelling alternative to autoregressive generation, offering parallel generation and improved global coherence. During inference, DLMs generate text by iteratively denoising masked sequences in parallel; however, determining which positions to unmask and which tokens to commit forms a large combinatorial search problem. Existing inference methods approximate this search using heuristics, which often yield suboptimal decoding paths; other approaches instead rely on additional training to guide token selection. To introduce a principled search mechanism for DLMs inference, we introduce MEDAL, a framework that integrates Monte Carlo Tree SEarch initialization for Diffusion LAnguage Model inference. We employ Monte Carlo Tree Search at the initialization stage to explore promising unmasking trajectories, providing a robust starting point for subsequent refinement. This integration is enabled by restricting the search space to high-confidence actions and prioritizing token choices that improve model confidence over remaining masked positions. Across multiple benchmarks, MEDAL achieves up to 22.0% improvement over existing inference strategies, establishing a new paradigm for search-based inference in diffusion language models.
Related papers
- Search or Accelerate: Confidence-Switched Position Beam Search for Diffusion Language Models [24.78455014605002]
Diffusion Language Models generate text by iteratively denoising a masked sequence.<n>Standard decoding follows a greedy rule: unmask the most confident positions.<n>We present SOAR, a training-free decoding algorithm that adapts its behavior to the model's uncertainty.
arXiv Detail & Related papers (2026-02-11T15:41:09Z) - UnMaskFork: Test-Time Scaling for Masked Diffusion via Deterministic Action Branching [7.499410407885288]
UnMaskFork (UMF) is a framework that formulates the unmasking trajectory as a search tree and employs Monte Carlo Tree Search to optimize the generation path.<n>UMF consistently outperforms existing test-time scaling baselines on complex coding benchmarks.
arXiv Detail & Related papers (2026-02-04T09:13:08Z) - Self-Rewarding Sequential Monte Carlo for Masked Diffusion Language Models [58.946955321428845]
This work presents self-rewarding sequential Monte Carlo (SMC)<n>Our algorithm stems from the observation that most existing MDLMs rely on a confidence-based sampling strategy.<n>We introduce the trajectory-level confidence as a self-rewarding signal for assigning particle importance weights.
arXiv Detail & Related papers (2026-02-02T09:21:45Z) - Prism: Efficient Test-Time Scaling via Hierarchical Search and Self-Verification for Discrete Diffusion Language Models [96.0074341403456]
Inference-time compute has re-emerged as a practical way to improve LLM reasoning.<n>Most test-time scaling (TTS) algorithms rely on autoregressive decoding.<n>We propose Prism, an efficient TTS framework for dLLMs.
arXiv Detail & Related papers (2026-02-02T09:14:51Z) - TSLM: Tree-Structured Language Modeling for Divergent Thinking [32.89058911018328]
We introduce Tree-Structured Language Modeling (TSLM), which uses special tokens to encode branching structure.<n>TSLM learns to internalize systematic exploration without redundant recomputation of shared prefixes.<n>Results suggest a new paradigm of inference-time scaling for robust reasoning.
arXiv Detail & Related papers (2026-01-30T08:04:59Z) - WeMMU: Enhanced Bridging of Vision-Language Models and Diffusion Models via Noisy Query Tokens [69.97021957331326]
We propose Noisy Query Tokens, which learn a distributed representation space between the VLM and Diffusion Model via end-to-end optimization.<n>We also introduce a VAE branch with linear projection to recover fine-grained image details.
arXiv Detail & Related papers (2025-12-02T09:02:20Z) - Tree Reward-Aligned Search for TReASURe in Masked Diffusion Language Models [13.433506313486701]
Tree search has emerged as a powerful framework for aligning generative models with task-specific rewards at test time.<n>We propose TReASURe, a tree-search test-time alignment method that addresses these issues.<n>TReASURe achieves state-of-the-art results on perplexity, linguistic acceptability, and control of sentiment and toxicity.
arXiv Detail & Related papers (2025-09-27T06:22:45Z) - Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling [70.8832906871441]
We study how to steer generation toward desired rewards without retraining the models.<n>Prior methods typically resample or filter within a single denoising trajectory, optimizing rewards step-by-step without trajectory-level refinement.<n>We introduce particle Gibbs sampling for diffusion language models (PG-DLM), a novel inference-time algorithm enabling trajectory-level refinement while preserving generation perplexity.
arXiv Detail & Related papers (2025-07-11T08:00:47Z) - Sparse Meets Dense: Unified Generative Recommendations with Cascaded Sparse-Dense Representations [22.48125906976824]
We introduce the Cascaded Organized Bi-Represented generAtive retrieval framework, which integrates sparse semantic IDs and dense vectors through a cascading process.<n>Our method alternates between generating these representations by first generating sparse IDs, which serve as conditions to aid in the generation of dense vectors.<n>During inference, COBRA employs a coarse-to-fine strategy, starting with sparse ID generation and refining them into dense vectors via the generative model.
arXiv Detail & Related papers (2025-03-04T10:00:05Z) - I-MCTS: Enhancing Agentic AutoML via Introspective Monte Carlo Tree Search [10.718560472954644]
Introspective Monte Carlo Tree Search (I-MCTS) is a novel approach that iteratively expands tree nodes through an introspective process.<n>We integrate a Large Language Model (LLM)-based value model to facilitate direct evaluation of each node's solution.<n>Our approach demonstrates a 6% absolute improvement in performance compared to the strong open-source AutoML agents.
arXiv Detail & Related papers (2025-02-20T16:19:09Z) - LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [56.273799410256075]
The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path.
The framework has been tested on general and advanced benchmarks, showing superior performance in terms of search efficiency and problem-solving capability.
arXiv Detail & Related papers (2024-10-03T18:12:29Z) - CART: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling [53.97609687516371]
Cross-modal retrieval aims to search for instances, which are semantically related to the query through the interaction of different modal data.<n>Traditional solutions utilize a single-tower or dual-tower framework to explicitly compute the score between queries and candidates.<n>We propose a generative cross-modal retrieval framework (CART) based on coarse-to-fine semantic modeling.
arXiv Detail & Related papers (2024-06-25T12:47:04Z) - Recommender Systems with Generative Retrieval [58.454606442670034]
We propose a novel generative retrieval approach, where the retrieval model autoregressively decodes the identifiers of the target candidates.
To that end, we create semantically meaningful of codewords to serve as a Semantic ID for each item.
We show that recommender systems trained with the proposed paradigm significantly outperform the current SOTA models on various datasets.
arXiv Detail & Related papers (2023-05-08T21:48:17Z) - Modeling Sequential Sentence Relation to Improve Cross-lingual Dense Retrieval [80.43859162884353]
We propose a multilingual multilingual language model called masked sentence model (MSM)<n>MSM consists of a sentence encoder to generate the sentence representations, and a document encoder applied to a sequence of sentence vectors from a document.<n>To train the model, we propose a masked sentence prediction task, which masks and predicts the sentence vector via a hierarchical contrastive loss with sampled negatives.
arXiv Detail & Related papers (2023-02-03T09:54:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.