Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization
- URL: http://arxiv.org/abs/2602.22675v2
- Date: Fri, 27 Feb 2026 07:08:13 GMT
- Title: Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization
- Authors: Qianben Chen, Tianrui Qin, King Zhu, Qiexiang Wang, Chengjun Yu, Shu Xu, Jiaqi Wu, Jiayu Zhang, Xinpeng Liu, Xin Gui, Jingyi Cao, Piaohong Wang, Dingfeng Shi, He Zhu, Tiannan Wang, Yuqing Wang, Maojia Song, Tianyu Zheng, Ge Zhang, Jian Yang, Jiaheng Liu, Minghao Liu, Yuchen Eleanor Jiang, Wangchunshu Zhou,
- Abstract summary: emphSearch More, Think Less (SMTL) is a framework for long-horizon agentic search that targets both efficiency and generalization.<n>We train an end-to-end agent using supervised fine-tuning and reinforcement learning, achieving strong and often state of the art performance across benchmarks.
- Score: 64.61432234404276
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent deep research agents primarily improve performance by scaling reasoning depth, but this leads to high inference cost and latency in search-intensive scenarios. Moreover, generalization across heterogeneous research settings remains challenging. In this work, we propose \emph{Search More, Think Less} (SMTL), a framework for long-horizon agentic search that targets both efficiency and generalization. SMTL replaces sequential reasoning with parallel evidence acquisition, enabling efficient context management under constrained context budgets. To support generalization across task types, we further introduce a unified data synthesis pipeline that constructs search tasks spanning both deterministic question answering and open-ended research scenarios with task appropriate evaluation metrics. We train an end-to-end agent using supervised fine-tuning and reinforcement learning, achieving strong and often state of the art performance across benchmarks including BrowseComp (48.6\%), GAIA (75.7\%), Xbench (82.0\%), and DeepResearch Bench (45.9\%). Compared to Mirothinker-v1.0, SMTL with maximum 100 interaction steps reduces the average number of reasoning steps on BrowseComp by 70.7\%, while improving accuracy.
Related papers
- HotelQuEST: Balancing Quality and Efficiency in Agentic Search [6.1626572270420334]
Agentic search has emerged as a promising paradigm for adaptive retrieval systems powered by large language models (LLMs)<n>We introduce HotelQuEST, a benchmark comprising 214 hotel search queries that range from simple factual requests to complex queries.<n>We find that LLM-based agents achieve higher accuracy than traditional retrievers, but at substantially higher costs due to redundant tool calls and suboptimal routing.
arXiv Detail & Related papers (2026-02-27T11:50:57Z) - W&D:Scaling Parallel Tool Calling for Efficient Deep Research Agents [48.22725588392165]
We propose a framework designed to investigate the behavior and performance of agents when scaling not only depth but also width via parallel tool calling.<n>We demonstrate that scaling width significantly improves performance on deep research benchmarks while reducing the number of turns required to obtain correct answers.<n>Our findings suggest that optimizing the trade-off between width and depth is a critical pathway toward high-efficiency deep research agents.
arXiv Detail & Related papers (2026-02-07T04:49:53Z) - Deep Researcher with Sequential Plan Reflection and Candidates Crossover (Deep Researcher Reflect Evolve) [0.0]
This paper introduces a novel Deep Researcher architecture designed to generate detailed research reports on complex PhD level topics.<n>Our system utilizes two key innovations: Sequential Research Plan Refinement via Reflection and a Candidates Crossover algorithm.<n>Our architecture achieved an overall score of 46.21, demonstrating superior performance by surpassing leading deep research agents.
arXiv Detail & Related papers (2026-01-28T18:45:39Z) - IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction [107.49922328855025]
IterResearch is a novel iterative deep-research paradigm that reformulates long-horizon research as a Markov Decision Process.<n>It achieves substantial improvements over existing open-source agents with average +14.5pp across six benchmarks.<n>It serves as an effective prompting strategy, improving frontier models by up to 19.2pp over ReAct on long-horizon tasks.
arXiv Detail & Related papers (2025-11-10T17:30:08Z) - MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning [73.27233666920618]
We propose MemSearcher, an agent workflow that iteratively maintains a compact memory and combines the current turn with it.<n>At each turn, MemSearcher fuses the user's question with the memory to generate reasoning traces, perform search actions, and update memory to retain only information essential for solving the task.<n>We introduce multi-context GRPO, an end-to-end RL framework that jointly optimize reasoning, search strategies, and memory management of MemSearcher Agents.
arXiv Detail & Related papers (2025-11-04T18:27:39Z) - AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play [45.02121903138421]
AceSearcher trains a single large language model (LLM) to alternate between two roles: a decomposer that breaks down complex queries and a solver that integrates retrieved contexts for answer generation.<n>Experiments on three reasoning-intensive tasks across 10 datasets show that AceSearcher outperforms state-of-the-art baselines, achieving an average exact match improvement of 7.6%.
arXiv Detail & Related papers (2025-09-29T02:14:30Z) - BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent [74.10138164281618]
BrowseComp-Plus is a benchmark derived from BrowseComp, employing a fixed, carefully curated corpus.<n>This benchmark allows comprehensive evaluation and disentangled analysis of deep research agents and retrieval methods.
arXiv Detail & Related papers (2025-08-08T17:55:11Z) - Benchmarking Deep Search over Heterogeneous Enterprise Data [73.55304268238474]
We present a new benchmark for evaluating a form of retrieval-augmented generation (RAG)<n>RAG requires source-aware, multi-hop reasoning over diverse, sparsed, but related sources.<n>We build it using a synthetic data pipeline that simulates business across product planning, development, and support stages.
arXiv Detail & Related papers (2025-06-29T08:34:59Z) - LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented Searchers [24.01783076521377]
Retrieval-Augmented Generation (RAG) is a crucial method for mitigating hallucinations in Large Language Models (LLMs)<n>Existing RAG methods typically employ query rewriting to clarify the user intent and manage multi-hop logic, while using hybrid retrieval to expand search scope.<n>We introduce a high-level searcher that decomposes complex queries into atomic queries, independent of any retriever-specific optimizations.<n>To harness the strengths of sparse retrievers for precise keyword retrieval, we have developed a new sparse searcher that employs Lucene syntax to enhance retrieval accuracy.
arXiv Detail & Related papers (2025-02-25T12:09:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.