Related papers: WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning

WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning

URL: http://arxiv.org/abs/2602.04634v1
Date: Wed, 04 Feb 2026 15:05:12 GMT
Title: WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning
Authors: Zelai Xu, Zhexuan Xu, Ruize Zhang, Chunyang Zhu, Shi Yu, Weilin Liu, Quanlu Zhang, Wenbo Ding, Chao Yu, Yu Wang,
Abstract summary: Existing multi-agent systems often rely on hand-crafted and turn-taking interactions that fail to parallelize work effectively.<n>We propose WideSeek-R1, a lead-agent-subagent framework trained via multi-agent reinforcement learning (MARL) to synergize scalable orchestration and parallel execution.<n>Extensive experiments show that WideSeek-R1-4B achieves an item F1 score of 40.0% on the WideSearch benchmark, which is comparable to the performance of single-agent DeepSeek-R1-671B.
Score: 15.087327596252932
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advancements in Large Language Models (LLMs) have largely focused on depth scaling, where a single agent solves long-horizon problems with multi-turn reasoning and tool use. However, as tasks grow broader, the key bottleneck shifts from individual competence to organizational capability. In this work, we explore a complementary dimension of width scaling with multi-agent systems to address broad information seeking. Existing multi-agent systems often rely on hand-crafted workflows and turn-taking interactions that fail to parallelize work effectively. To bridge this gap, we propose WideSeek-R1, a lead-agent-subagent framework trained via multi-agent reinforcement learning (MARL) to synergize scalable orchestration and parallel execution. By utilizing a shared LLM with isolated contexts and specialized tools, WideSeek-R1 jointly optimizes the lead agent and parallel subagents on a curated dataset of 20k broad information-seeking tasks. Extensive experiments show that WideSeek-R1-4B achieves an item F1 score of 40.0% on the WideSearch benchmark, which is comparable to the performance of single-agent DeepSeek-R1-671B. Furthermore, WideSeek-R1-4B exhibits consistent performance gains as the number of parallel subagents increases, highlighting the effectiveness of width scaling.

Related papers

MARTI-MARS$^2$: Scaling Multi-Agent Self-Search via Reinforcement Learning for Code Generation [64.2621682259008]
Multi-Agent Reinforced Training and Inference Framework with Self-Search Scaling (MARTI-MARS2)<n>We propose a Multi-Agent Reinforced Training and Inference Framework with Self-Search Scaling (MARTI-MARS2) to integrate policy learning with multi-agent tree search.<n>We show that MARTI-MARS2 achieves 77.7%, outperforming strong baselines like GPT-5.1 on challenging code generation benchmarks.
arXiv Detail & Related papers (2026-02-08T07:28:44Z)
W&D:Scaling Parallel Tool Calling for Efficient Deep Research Agents [48.22725588392165]
We propose a framework designed to investigate the behavior and performance of agents when scaling not only depth but also width via parallel tool calling.<n>We demonstrate that scaling width significantly improves performance on deep research benchmarks while reducing the number of turns required to obtain correct answers.<n>Our findings suggest that optimizing the trade-off between width and depth is a critical pathway toward high-efficiency deep research agents.
arXiv Detail & Related papers (2026-02-07T04:49:53Z)
WideSeek: Advancing Wide Research via Multi-Agent Scaling [29.02742625120584]
Wide Research is a paradigm essential for synthesizing and synthesizing comprehensive information under complex constraints in parallel.<n>We take a deep dive into Wide Research from two perspectives: Data Pipeline and Agent Optimization.<n>First, we produce WideSeekBench, a benchmark constructed via a rigorous multi-phase data pipeline to ensure diversity across the target information volume.<n>Second, we introduce WideSeek, a dynamic hierarchical multi-agent architecture that can autonomously fork parallel sub-agents based on task requirements.
arXiv Detail & Related papers (2026-02-02T18:32:48Z)
Code-in-the-Loop Forensics: Agentic Tool Use for Image Forgery Detection [59.04089915447622]
ForenAgent is an interactive IFD framework that enables MLLMs to autonomously generate, execute, and refine Python-based low-level tools around the detection objective.<n>Inspired by human reasoning, we design a dynamic reasoning loop comprising global perception, local focusing, iterative probing, and holistic adjudication.<n>Experiments show that ForenAgent exhibits emergent tool-use competence and reflective reasoning on challenging IFD tasks.
arXiv Detail & Related papers (2025-12-18T08:38:44Z)
MUSE: A Simple Yet Effective Multimodal Search-Based Framework for Lifelong User Interest Modeling [48.18456242206804]
We present a systematic analysis of how to leverage multimodal signals across both stages of lifelong modeling framework.<n>We propose MUSE, a simple yet effective multimodal search-based framework.<n>MUSE has been deployed in Taobao display advertising system, enabling 100K-length user behavior sequence modeling.
arXiv Detail & Related papers (2025-12-08T06:55:13Z)
Training Multi-Image Vision Agents via End2End Reinforcement Learning [51.81337984526068]
We propose IMAgent, an open-source vision agent trained via end-to-end reinforcement learning.<n>By leveraging a multi-agent system, we generate challenging and visually-rich multi-image QA pairs.<n>We develop two specialized tools for visual reflection and confirmation, allowing the model to proactively reallocate its attention to image content.
arXiv Detail & Related papers (2025-12-05T10:02:38Z)
AgentRouter: A Knowledge-Graph-Guided LLM Router for Collaborative Multi-Agent Question Answering [51.07491603393163]
tAgent is a framework that formulates multi-agent QA as a knowledge-graph-guided routing problem supervised by empirical performance signals.<n>By leveraging soft supervision and weighted aggregation of agent outputs, Agent learns principled collaboration schemes that capture the complementary strengths of diverse agents.
arXiv Detail & Related papers (2025-10-06T23:20:49Z)
Multi-Agent Tool-Integrated Policy Optimization [67.12841355267678]
Large language models (LLMs) increasingly rely on multi-turn tool-integrated planning for knowledge-intensive and complex reasoning tasks.<n>Existing implementations typically rely on a single agent, but they suffer from limited context length and noisy tool responses.<n>No existing methods support effective reinforcement learning post-training of tool-integrated multi-agent frameworks.
arXiv Detail & Related papers (2025-10-06T10:44:04Z)
Cross-Task Experiential Learning on LLM-based Multi-Agent Collaboration [63.90193684394165]
We introduce multi-agent cross-task experiential learning (MAEL), a novel framework that endows LLM-driven agents with explicit cross-task learning and experience accumulation.<n>During the experiential learning phase, we quantify the quality for each step in the task-solving workflow and store the resulting rewards.<n>During inference, agents retrieve high-reward, task-relevant experiences as few-shot examples to enhance the effectiveness of each reasoning step.
arXiv Detail & Related papers (2025-05-29T07:24:37Z)
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding [40.52017994491893]
MDocAgent is a novel RAG and multi-agent framework that leverages both text and image.<n>Our system employs five specialized agents: a general agent, a critical agent, a text agent, an image agent and a summarizing agent.<n>Preliminary experiments on five benchmarks demonstrate the effectiveness of our MDocAgent, achieve an average improvement of 12.1%.
arXiv Detail & Related papers (2025-03-18T06:57:21Z)
AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning [93.96463520716759]
Large language model (LLM) agents have demonstrated impressive capabilities in utilizing external tools and knowledge to boost accuracy and hallucinations. Here, we introduce AvaTaR, a novel and automated framework that optimize an LLM agent to effectively leverage provided tools, improving performance on a given task.
arXiv Detail & Related papers (2024-06-17T04:20:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.