Related papers: UnMaskFork: Test-Time Scaling for Masked Diffusion via Deterministic Action Branching

UnMaskFork: Test-Time Scaling for Masked Diffusion via Deterministic Action Branching

URL: http://arxiv.org/abs/2602.04344v1
Date: Wed, 04 Feb 2026 09:13:08 GMT
Title: UnMaskFork: Test-Time Scaling for Masked Diffusion via Deterministic Action Branching
Authors: Kou Misaki, Takuya Akiba,
Abstract summary: UnMaskFork (UMF) is a framework that formulates the unmasking trajectory as a search tree and employs Monte Carlo Tree Search to optimize the generation path.<n>UMF consistently outperforms existing test-time scaling baselines on complex coding benchmarks.
Score: 7.499410407885288
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Test-time scaling strategies have effectively leveraged inference-time compute to enhance the reasoning abilities of Autoregressive Large Language Models. In this work, we demonstrate that Masked Diffusion Language Models (MDLMs) are inherently amenable to advanced search strategies, owing to their iterative and non-autoregressive generation process. To leverage this, we propose UnMaskFork (UMF), a framework that formulates the unmasking trajectory as a search tree and employs Monte Carlo Tree Search to optimize the generation path. In contrast to standard scaling methods relying on stochastic sampling, UMF explores the search space through deterministic partial unmasking actions performed by multiple MDLMs. Our empirical evaluation demonstrates that UMF consistently outperforms existing test-time scaling baselines on complex coding benchmarks, while also exhibiting strong scalability on mathematical reasoning tasks.

Related papers

Self-Rewarding Sequential Monte Carlo for Masked Diffusion Language Models [58.946955321428845]
This work presents self-rewarding sequential Monte Carlo (SMC)<n>Our algorithm stems from the observation that most existing MDLMs rely on a confidence-based sampling strategy.<n>We introduce the trajectory-level confidence as a self-rewarding signal for assigning particle importance weights.
arXiv Detail & Related papers (2026-02-02T09:21:45Z)
Prism: Efficient Test-Time Scaling via Hierarchical Search and Self-Verification for Discrete Diffusion Language Models [96.0074341403456]
Inference-time compute has re-emerged as a practical way to improve LLM reasoning.<n>Most test-time scaling (TTS) algorithms rely on autoregressive decoding.<n>We propose Prism, an efficient TTS framework for dLLMs.
arXiv Detail & Related papers (2026-02-02T09:14:51Z)
Diffusion Language Model Inference with Monte Carlo Tree Search [22.7649405246503]
Diffusion language models (DLMs) have emerged as a compelling alternative to autoregressive generation.<n>We introduce MEDAL, a principled search mechanism for DLMs inference.<n>Across multiple benchmarks, MEDAL achieves up to 22.0% improvement over existing inference strategies.
arXiv Detail & Related papers (2025-12-13T04:30:02Z)
Tree Reward-Aligned Search for TReASURe in Masked Diffusion Language Models [13.433506313486701]
Tree search has emerged as a powerful framework for aligning generative models with task-specific rewards at test time.<n>We propose TReASURe, a tree-search test-time alignment method that addresses these issues.<n>TReASURe achieves state-of-the-art results on perplexity, linguistic acceptability, and control of sentiment and toxicity.
arXiv Detail & Related papers (2025-09-27T06:22:45Z)
Scaling Up, Speeding Up: A Benchmark of Speculative Decoding for Efficient LLM Test-Time Scaling [38.27469349005585]
Test-time scaling is a powerful paradigm for enhancing the reasoning capabilities of large language models.<n>Test-time scaling is inherently inefficient due to the generation of redundant and repetitive reasoning traces.<n>We introduce the first comprehensive benchmark designed to evaluate speculative decoding methods for accelerating test-time scaling.
arXiv Detail & Related papers (2025-08-30T01:54:55Z)
Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs [78.09559830840595]
We present the first systematic study on quantizing diffusion-based language models.<n>We identify the presence of activation outliers, characterized by abnormally large activation values.<n>We implement state-of-the-art PTQ methods and conduct a comprehensive evaluation.
arXiv Detail & Related papers (2025-08-20T17:59:51Z)
LLM-Symbolic Integration for Robust Temporal Tabular Reasoning [69.27153114778748]
We introduce TempTabQA-C, a synthetic dataset designed for systematic and controlled evaluations.<n>This structured approach allows Large Language Models (LLMs) to generate and executesql queries, enhancing generalization and mitigating biases.
arXiv Detail & Related papers (2025-06-06T05:14:04Z)
Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking [17.511240770486452]
Masked diffusion models (MDMs) have shown competitive performance compared to autoregressive models (ARMs) for language modeling.<n>We introduce EB-Sampler, a drop-in replacement for existing samplers, utilizing an Entropy Bounded unmasking procedure.<n> EB-Sampler accelerates sampling from current state of the art MDMs by roughly 2-3x on standard coding and math reasoning benchmarks without loss in performance.
arXiv Detail & Related papers (2025-05-30T17:52:55Z)
Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric [99.56567010306807]
Large Language Models (LLMs) have become indispensable across academia, industry, and daily applications.<n>One core challenge of evaluation in the large language model (LLM) era is the generalization issue.<n>We propose Model Utilization Index (MUI), a mechanism interpretability enhanced metric that complements traditional performance scores.
arXiv Detail & Related papers (2025-04-10T04:09:47Z)
Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search [9.875355690346794]
We propose Adaptive Branching Monte Carlo Tree Search (AB-MCTS)<n>AB-MCTS generalizes repeated sampling with principled multi-turn exploration and exploitation.<n>We evaluate our method on complex coding and engineering tasks using frontier models.
arXiv Detail & Related papers (2025-03-06T13:10:40Z)
Bridge the Points: Graph-based Few-shot Segment Anything Semantically [79.1519244940518]
Recent advancements in pre-training techniques have enhanced the capabilities of vision foundation models. Recent studies extend the SAM to Few-shot Semantic segmentation (FSS) We propose a simple yet effective approach based on graph analysis.
arXiv Detail & Related papers (2024-10-09T15:02:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.