Related papers: BrowseMaster: Towards Scalable Web Browsing via Tool-Augmented Programmatic Agent Pair

BrowseMaster: Towards Scalable Web Browsing via Tool-Augmented Programmatic Agent Pair

URL: http://arxiv.org/abs/2508.09129v1
Date: Tue, 12 Aug 2025 17:56:25 GMT
Title: BrowseMaster: Towards Scalable Web Browsing via Tool-Augmented Programmatic Agent Pair
Authors: Xianghe Pang, Shuo Tang, Rui Ye, Yuwen Du, Yaxin Du, Siheng Chen,
Abstract summary: Current large language model (M)-based agents struggle to achieve balance due to limitations in search breadth and reasoning depth.<n>We propose BrowseMaster framework built around augmented planner-executor agent pair.<n>Tests on English and Chinese show that BrowseMaster consistently outperforms open benchmarks and proprietary baselines, achieving scores of 3 on BrowseComp-en and 46.5 on BrowseComp-zh, which demonstrates its strong capability in complex, reasoning-heavy information-seeking tasks at scale.
Score: 28.052062258597225
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Effective information seeking in the vast and ever-growing digital landscape requires balancing expansive search with strategic reasoning. Current large language model (LLM)-based agents struggle to achieve this balance due to limitations in search breadth and reasoning depth, where slow, serial querying restricts coverage of relevant sources and noisy raw inputs disrupt the continuity of multi-step reasoning. To address these challenges, we propose BrowseMaster, a scalable framework built around a programmatically augmented planner-executor agent pair. The planner formulates and adapts search strategies based on task constraints, while the executor conducts efficient, targeted retrieval to supply the planner with concise, relevant evidence. This division of labor preserves coherent, long-horizon reasoning while sustaining broad and systematic exploration, overcoming the trade-off that limits existing agents. Extensive experiments on challenging English and Chinese benchmarks show that BrowseMaster consistently outperforms open-source and proprietary baselines, achieving scores of 30.0 on BrowseComp-en and 46.5 on BrowseComp-zh, which demonstrates its strong capability in complex, reasoning-heavy information-seeking tasks at scale.

Related papers

Training Multi-Turn Search Agent via Contrastive Dynamic Branch Sampling [29.182538022605627]
Branching Relative Policy Optimization (BranPO) is a value-free method that provides step-level contrastive supervision without dense rewards.<n>BranPO truncates trajectories near the tail and resamples alternative continuations to construct contrastive suffixes over shared prefixes.<n>To further boost efficiency and stabilize training, we introduce difficulty-aware branch sampling to adapt branching frequency across tasks, and redundant step masking to suppress uninformative actions.
arXiv Detail & Related papers (2026-02-03T16:43:09Z)
DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents [10.197402632091551]
DeepSearchQA is a 900-prompt benchmark for evaluating agents on difficult multi-step information-seeking tasks.<n>This dataset is designed to evaluate an agent's ability to execute complex search plans to generate exhaustive answer lists.
arXiv Detail & Related papers (2026-01-28T19:20:47Z)
Beyond Monolithic Architectures: A Multi-Agent Search and Knowledge Optimization Framework for Agentic Search [56.78490647843876]
Agentic search has emerged as a promising paradigm for complex information seeking by enabling Large Language Models (LLMs) to interleave reasoning with tool use.<n>We propose bfM-ASK, a framework that explicitly decouples agentic search into two complementary roles: Search Behavior Agents, which plan and execute search actions, and Knowledge Management Agents, which aggregate, filter, and maintain a compact internal context.
arXiv Detail & Related papers (2026-01-08T08:13:27Z)
PRInTS: Reward Modeling for Long-Horizon Information Seeking [74.14496236655911]
We introduce PRInTS, a generative PRM trained with dual capabilities.<n>We show that PRInTS enhances information-seeking abilities of open-source models as well as specialized agents.
arXiv Detail & Related papers (2025-11-24T17:09:43Z)
ParallelMuse: Agentic Parallel Thinking for Deep Information Seeking [59.65564262588308]
Parallel thinking expands exploration breadth, complementing the deep exploration of information-seeking (IS) agents.<n>We propose ParallelMuse, a two-stage paradigm designed for deep IS agents.<n> Experiments across multiple open-source agents and benchmarks demonstrate up to 62% performance improvement.
arXiv Detail & Related papers (2025-10-28T17:51:50Z)
UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios [63.67884284105684]
We introduce textbfUltraHorizon, a novel benchmark that measures the foundational capabilities essential for complex real-world challenges.<n>Agents are designed in long-horizon discovery tasks where they must iteratively uncover hidden rules.<n>Our experiments reveal that LLM-agents consistently underperform in these settings, whereas human participants achieve higher scores.
arXiv Detail & Related papers (2025-09-26T02:04:00Z)
DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL [60.47878242100153]
We present DeepDive to advance deep search agents.<n>We propose a strategy to automatically synthesize complex, difficult, and hard-to-find questions from open knowledge graphs.<n>We apply end-to-end multi-turn reinforcement learning to enhance LLMs' long-horizon reasoning with deep search.
arXiv Detail & Related papers (2025-09-12T17:52:35Z)
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent [67.35045977420089]
Web agents such as Deep Research have demonstrated cognitive abilities, capable of solving highly challenging information-seeking problems.<n>This makes multimodal Deep Research highly challenging, as such agents require much stronger reasoning abilities in perception, logic, knowledge.<n>We introduce WebWatcher, a multi-modal Agent for Deep Research equipped with enhanced visual-language reasoning capabilities.
arXiv Detail & Related papers (2025-08-07T18:03:50Z)
WebSailor: Navigating Super-human Reasoning for Web Agent [72.5231321118689]
WebSailor is a complete post-training methodology designed to instill this crucial capability.<n>Our approach involves generating novel, high-uncertainty tasks through structured sampling and information obfuscation.<n>WebSailor significantly outperforms all opensource agents in complex information-seeking tasks.
arXiv Detail & Related papers (2025-07-03T12:59:07Z)
TaskCraft: Automated Generation of Agentic Tasks [39.33785092294476]
Agentic tasks require multi-step problem solving with autonomy, tool use, and adaptive reasoning.<n>We introduce textscCraftTask, an automated workflow for generating difficulty-scalable, multi-tool, and verifiable agentic tasks.<n>We present a large-scale synthetic dataset of approximately 36,000 tasks with varying difficulty to support future research on agent tuning and evaluation.
arXiv Detail & Related papers (2025-06-11T17:58:14Z)
Knowledge-Aware Iterative Retrieval for Multi-Agent Systems [0.0]
We introduce a novel large language model (LLM)-driven agent framework.<n>It iteratively refines queries and filters contextual evidence by leveraging dynamically evolving knowledge.<n>The proposed system supports both competitive and collaborative sharing of updated context.
arXiv Detail & Related papers (2025-03-17T15:27:02Z)
Textualized Agent-Style Reasoning for Complex Tasks by Multiple Round LLM Generation [49.27250832754313]
We present AgentCOT, a llm-based autonomous agent framework. At each step, AgentCOT selects an action and executes it to yield an intermediate result with supporting evidence. We introduce two new strategies to enhance the performance of AgentCOT.
arXiv Detail & Related papers (2024-09-19T02:20:06Z)
PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization [60.00631098364391]
PromptAgent is an optimization method that crafts expert-level prompts equivalent in quality to those handcrafted by experts. Inspired by human-like trial-and-error exploration, PromptAgent induces precise expert-level insights and in-depth instructions. We apply PromptAgent to 12 tasks spanning three practical domains.
arXiv Detail & Related papers (2023-10-25T07:47:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.