Related papers: Unifying Tree Search Algorithm and Reward Design for LLM Reasoning: A Survey

Unifying Tree Search Algorithm and Reward Design for LLM Reasoning: A Survey

URL: http://arxiv.org/abs/2510.09988v1
Date: Sat, 11 Oct 2025 03:29:18 GMT
Title: Unifying Tree Search Algorithm and Reward Design for LLM Reasoning: A Survey
Authors: Jiaqi Wei, Xiang Zhang, Yuejin Yang, Wenxuan Huang, Juntai Cao, Sheng Xu, Xiang Zhuang, Zhangyang Gao, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Chenyu You, Wanli Ouyang, Siqi Sun,
Abstract summary: Deliberative tree search is a cornerstone of Large Language Model (LLM) research.<n>This paper introduces a unified framework that deconstructs search algorithms into three core components.
Score: 92.71325249013535
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deliberative tree search is a cornerstone of modern Large Language Model (LLM) research, driving the pivot from brute-force scaling toward algorithmic efficiency. This single paradigm unifies two critical frontiers: \textbf{Test-Time Scaling (TTS)}, which deploys on-demand computation to solve hard problems, and \textbf{Self-Improvement}, which uses search-generated data to durably enhance model parameters. However, this burgeoning field is fragmented and lacks a common formalism, particularly concerning the ambiguous role of the reward signal -- is it a transient heuristic or a durable learning target? This paper resolves this ambiguity by introducing a unified framework that deconstructs search algorithms into three core components: the \emph{Search Mechanism}, \emph{Reward Formulation}, and \emph{Transition Function}. We establish a formal distinction between transient \textbf{Search Guidance} for TTS and durable \textbf{Parametric Reward Modeling} for Self-Improvement. Building on this formalism, we introduce a component-centric taxonomy, synthesize the state-of-the-art, and chart a research roadmap toward more systematic progress in creating autonomous, self-improving agents.

Related papers

Dynamics Within Latent Chain-of-Thought: An Empirical Study of Causal Structure [58.89643769707751]
We study latent chain-of-thought as a manipulable causal process in representation space.<n>We find that latent-step budgets behave less like homogeneous extra depth and more like staged functionality with non-local routing.<n>These results motivate mode-conditional and stability-aware analyses as more reliable tools for interpreting and improving latent reasoning systems.
arXiv Detail & Related papers (2026-02-09T15:25:12Z)
OMG-Agent: Toward Robust Missing Modality Generation with Decoupled Coarse-to-Fine Agentic Workflows [9.617220633655716]
We present textbfunderlineOmni-textbfunderlineModality textbfunderlineGeneration Agent (textbfOMG-Agent)
arXiv Detail & Related papers (2026-02-04T02:25:40Z)
A Simple and Effective Framework for Symmetric Consistent Indexing in Large-Scale Dense Retrieval [11.72564658353791]
Dense retrieval has become the industry standard in large-scale information retrieval systems due to its high efficiency and competitive accuracy.<n>The widely adopted dual-tower encoding architecture introduces inherent challenges, primarily representational space misalignment and retrieval index inconsistency.<n>This paper proposes a simple and effective framework named SCI comprising two synergistic modules.<n>We provide theoretical guarantees for our approach, with its effectiveness validated by results across public datasets and real-world e-commerce datasets.
arXiv Detail & Related papers (2025-12-15T08:11:24Z)
Memory-Amortized Inference: A Topological Unification of Search, Closure, and Structure [6.0044467881527614]
We propose textbfMemory-Amortized Inference (MAI), a formal framework that unifies learning and memory as phase transitions of a single geometric substrate.<n>We show that cognition operates by converting high-complexity search into low-complexity lookup.<n>This framework offers a rigorous explanation for the emergence of fast-thinking (intuition) from slow-thinking (reasoning)
arXiv Detail & Related papers (2025-11-28T16:28:24Z)
AI Founding Fathers: A Case Study of GIS Search in Multi-Agent Pipelines [0.0]
Large Language Models (LLMs) show exceptional fluency, but efforts persist to extract stronger reasoning capabilities from them.<n>This paper advances a systematic framework for understanding LLM reasoning and optimization.
arXiv Detail & Related papers (2025-11-12T05:52:55Z)
Step-Aware Policy Optimization for Reasoning in Diffusion Large Language Models [57.42778606399764]
Diffusion language models (dLLMs) offer a promising, non-autoregressive paradigm for text generation.<n>Current reinforcement learning approaches often rely on sparse, outcome-based rewards.<n>We argue that this stems from a fundamental mismatch with the natural structure of reasoning.
arXiv Detail & Related papers (2025-10-02T00:34:15Z)
A2R: An Asymmetric Two-Stage Reasoning Framework for Parallel Reasoning [57.727084580884075]
Asymmetric Two-Stage Reasoning framework designed to bridge gap between a model's potential and its actual performance.<n>A2R-Efficient is a "small-to-big" variant that combines a Qwen3-4B explorer with a Qwen3-8B synthesizer.<n>Results show A2R is not only a performance-boosting framework but also an efficient and practical solution for real-world applications.
arXiv Detail & Related papers (2025-09-26T08:27:03Z)
DecoupleSearch: Decouple Planning and Search via Hierarchical Reward Modeling [56.45844907505722]
We propose DecoupleSearch, a framework that decouples planning and search processes using dual value models.<n>Our approach constructs a reasoning tree, where each node represents planning and search steps.<n>During inference, Hierarchical Beam Search iteratively refines planning and search candidates with dual value models.
arXiv Detail & Related papers (2025-09-07T13:45:09Z)
Enhancing Test-Time Scaling of Large Language Models with Hierarchical Retrieval-Augmented MCTS [19.394761422323853]
We introduce R2-LLMs, a novel and versatile hierarchical retrieval-augmented reasoning framework.<n>R2-LLMs enhances inference-time generalization by integrating dual-level retrieval-based in-context learning.<n> Empirical evaluations on the MATH500, GSM8K, and OlympiadBench-TO datasets achieve substantial relative improvement.
arXiv Detail & Related papers (2025-07-08T00:41:12Z)
Activation-Guided Consensus Merging for Large Language Models [25.68958388022476]
We present textbfActivation-Guided textbfConsensus textbfMerging (textbfACM), a plug-and-play merging framework that determines layer-specific merging coefficients.<n>Experiments on Long-to-Short (L2S) and general merging tasks demonstrate that ACM consistently outperforms all baseline methods.
arXiv Detail & Related papers (2025-05-20T07:04:01Z)
Constrained Auto-Regressive Decoding Constrains Generative Retrieval [71.71161220261655]
Generative retrieval seeks to replace traditional search index data structures with a single large-scale neural network.<n>In this paper, we examine the inherent limitations of constrained auto-regressive generation from two essential perspectives: constraints and beam search.
arXiv Detail & Related papers (2025-04-14T06:54:49Z)
BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving [11.596474985695679]
We release the StructuredOR dataset, annotated with comprehensive labels that capture the complete mathematical modeling process.<n>We propose BPP-Search, an algorithm that integrates reinforcement learning into a tree-of-thought structure.
arXiv Detail & Related papers (2024-11-26T13:05:53Z)
Enhancing LLM Reasoning with Reward-guided Tree Search [95.06503095273395]
o1-like reasoning approach is challenging, and researchers have been making various attempts to advance this open area of research.<n>We present a preliminary exploration into enhancing the reasoning abilities of LLMs through reward-guided tree search algorithms.
arXiv Detail & Related papers (2024-11-18T16:15:17Z)
GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning [51.677086019209554]
We propose a Generalized Structural Sparse to capture powerful relationships across modalities for pair-wise similarity learning. The distance metric delicately encapsulates two formats of diagonal and block-diagonal terms. Experiments on cross-modal and two extra uni-modal retrieval tasks have validated its superiority and flexibility.
arXiv Detail & Related papers (2024-10-20T03:45:50Z)
CART: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling [53.97609687516371]
Cross-modal retrieval aims to search for instances, which are semantically related to the query through the interaction of different modal data.<n>Traditional solutions utilize a single-tower or dual-tower framework to explicitly compute the score between queries and candidates.<n>We propose a generative cross-modal retrieval framework (CART) based on coarse-to-fine semantic modeling.
arXiv Detail & Related papers (2024-06-25T12:47:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.