Related papers: Reducing Cognitive Overhead in Tool Use via Multi-Small-Agent Reinforcement Learning

Reducing Cognitive Overhead in Tool Use via Multi-Small-Agent Reinforcement Learning

URL: http://arxiv.org/abs/2508.08882v4
Date: Sat, 11 Oct 2025 08:24:16 GMT
Title: Reducing Cognitive Overhead in Tool Use via Multi-Small-Agent Reinforcement Learning
Authors: Dayu Wang, Jiaye Yang, Weikang Li, Jiahui Liang, Yang Li,
Abstract summary: We present MSARL, a framework that explicitly decouples reasoning from tool use.<n>In MSARL, a Reasoning Agent decomposes problems and plans tool invocations, while multiple Tool Agents specialize in specific external tools.<n>On mathematical problem solving with code execution, MSARL significantly improves reasoning stability and final-answer accuracy over single-agent baselines.
Score: 1.974921946982281
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in multi-agent systems highlight the potential of specialized small agents that collaborate via division of labor. Existing tool-integrated reasoning systems, however, often follow a single-agent paradigm in which one large model interleaves long-horizon reasoning with precise tool operations, leading to cognitive-load interference and unstable coordination. We present MSARL, a Multi-Small-Agent Reinforcement Learning framework that explicitly decouples reasoning from tool use. In MSARL, a Reasoning Agent decomposes problems and plans tool invocations, while multiple Tool Agents specialize in specific external tools, each trained via a combination of imitation learning and reinforcement learning with role-specific rewards. On mathematical problem solving with code execution, MSARL significantly improves reasoning stability and final-answer accuracy over single-agent baselines. Moreover, the architecture generalizes to diverse tool-use tasks, demonstrating that cognitive-role decoupling with small agents is a scalable blueprint for multi-agent AI design.

Related papers

Evolving from Tool User to Creator via Training-Free Experience Reuse in Multimodal Reasoning [16.12114923351562]
We propose a training-free framework that transforms agents from tool users to tool creators.<n>This approach harvests reasoning experiences and distills them into reusable assets.<n>We also introduce a memory consolidation mechanism to maintain the tool library.
arXiv Detail & Related papers (2026-02-02T11:37:45Z)
AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning [66.24374176797075]
We introduce textbfAdaReasoner, a family of multimodal models that learn tool use as a general reasoning skill rather than as tool-specific or explicitly supervised behavior.<n>AdaReasoner is enabled by (i) a scalable data curation pipeline exposing models to long-horizon, multi-step tool interactions; (ii) Tool-GRPO, a reinforcement learning algorithm that prioritizes tool selection and sequencing based on end-task success; and (iii) an adaptive learning mechanism that dynamically regulates tool usage.
arXiv Detail & Related papers (2026-01-26T16:04:43Z)
MindWatcher: Toward Smarter Multimodal Tool-Integrated Reasoning [55.221850286246]
We introduce MindWatcher, a tool-integrated reasoning agent with interleaved thinking and multimodal chain-of-thought (CoT) reasoning.<n>MindWatcher can autonomously decide whether and how to invoke diverse tools and coordinate their use.<n>A large-scale, high-quality local image retrieval database, covering eight categories including cars, animals, and plants, endows model with robust object recognition.
arXiv Detail & Related papers (2025-12-29T12:16:12Z)
A Flexible Multi-Agent LLM-Human Framework for Fast Human Validated Tool Building [0.8373057326694192]
We introduce CollabToolBuilder, a flexible multiagent LLM framework with expert-in-the-loop (HITL) guidance.<n>It iteratively learns to create tools for a target goal, aligning with human intent and process.<n>The architecture generates and validates tools via four specialized agents.
arXiv Detail & Related papers (2025-12-01T09:19:18Z)
DeepAgent: A General Reasoning Agent with Scalable Toolsets [111.6384541877723]
DeepAgent is an end-to-end deep reasoning agent that performs autonomous thinking, tool discovery, and action execution.<n>To address the challenges of long-horizon interactions, we introduce an autonomous memory folding mechanism that compresses past interactions into structured episodic, working, and tool memories.<n>We develop an end-to-end reinforcement learning strategy, namely ToolPO, that leverages LLM-simulated APIs and applies tool-call advantage attribution to assign fine-grained credit to the tool invocation tokens.
arXiv Detail & Related papers (2025-10-24T16:24:01Z)
ToolLibGen: Scalable Automatic Tool Creation and Aggregation for LLM Reasoning [80.10274552177096]
Large Language Models (LLMs) equipped with external tools have demonstrated enhanced performance on complex reasoning tasks.<n>The widespread adoption of this tool-augmented reasoning is hindered by the scarcity of domain-specific tools.<n>We propose a systematic approach to automatically an unstructured collection of tools into a structured tool library.
arXiv Detail & Related papers (2025-10-09T04:11:16Z)
Multi-Agent Tool-Integrated Policy Optimization [67.12841355267678]
Large language models (LLMs) increasingly rely on multi-turn tool-integrated planning for knowledge-intensive and complex reasoning tasks.<n>Existing implementations typically rely on a single agent, but they suffer from limited context length and noisy tool responses.<n>No existing methods support effective reinforcement learning post-training of tool-integrated multi-agent frameworks.
arXiv Detail & Related papers (2025-10-06T10:44:04Z)
InfiAgent: Self-Evolving Pyramid Agent Framework for Infinite Scenarios [28.65914611521654]
InfiAgent is a Pyramid-like DAG-based Multi-Agent Framework that can be applied to textbfinfinite scenarios.<n>InfiAgent achieves 9.9% higher performance compared to ADAS (similar auto-generated agent framework)
arXiv Detail & Related papers (2025-09-26T15:44:09Z)
AgentFly: Extensible and Scalable Reinforcement Learning for LM Agents [25.735754822676277]
Language model (LM) agents have gained significant attention for their ability to autonomously complete tasks.<n> reinforcement learning (RL) has been explored to enhance LM's capabilities, such as reasoning and factuality.<n>We built AgentFly, a scalable and Agent-RL framework designed to empower LM agents with a variety of RL algorithms.
arXiv Detail & Related papers (2025-07-20T10:22:36Z)
PixelThink: Towards Efficient Chain-of-Pixel Reasoning [70.32510083790069]
PixelThink is a simple yet effective scheme that integrates externally estimated task difficulty and internally measured model uncertainty.<n>It learns to compress reasoning length in accordance with scene complexity and predictive confidence.<n> Experimental results demonstrate that the proposed approach improves both reasoning efficiency and overall segmentation performance.
arXiv Detail & Related papers (2025-05-29T17:55:49Z)
MA-RAG: Multi-Agent Retrieval-Augmented Generation via Collaborative Chain-of-Thought Reasoning [43.66966457772646]
MA-RAG orchestrates a collaborative set of specialized AI agents to tackle each stage of the RAG pipeline with task-aware reasoning.<n>Our design allows fine-grained control over information flow without any model fine-tuning.<n>This modular and reasoning-driven architecture enables MA-RAG to deliver robust, interpretable results.
arXiv Detail & Related papers (2025-05-26T15:05:18Z)
Towards Adaptive Software Agents for Debugging [0.40964539027092917]
We propose an adaptive agentic design, where the number of agents and their roles are determined dynamically.<n>Our initial evaluation shows that, with the adaptive design, the number of agents that are generated depends on the complexity of the buggy code.<n> Regarding the effectiveness of the fix, we noticed an average improvement of 11% compared to the one-shot prompting.
arXiv Detail & Related papers (2025-04-25T12:48:08Z)
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning [53.817538122688944]
We introduce Reinforced Meta-thinking Agents (ReMA) to elicit meta-thinking behaviors from Reasoning of Large Language Models (LLMs)<n>ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions.<n> Empirical results from single-turn experiments demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks.
arXiv Detail & Related papers (2025-03-12T16:05:31Z)
Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger [49.81945268343162]
We propose MeCo, an adaptive decision-making strategy for external tool use.<n>MeCo quantifies metacognitive scores by capturing high-level cognitive signals in the representation space.<n>MeCo is fine-tuning-free and incurs minimal cost.
arXiv Detail & Related papers (2025-02-18T15:45:01Z)
Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools [19.70178343422698]
We introduce Agentic Reasoning, a framework that enhances large language model (LLM) reasoning by integrating external tool-using agents.<n>Key innovation in our framework is the Mind-Map agent, which constructs a structured knowledge graph to store reasoning context.<n>When deployed on DeepSeek-R1, our method achieves a new state-of-the-art (SOTA) among public models.
arXiv Detail & Related papers (2025-02-07T04:08:46Z)
Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement [112.04307762405669]
G"odel Agent is a self-evolving framework inspired by the G"odel machine.<n>G"odel Agent can achieve continuous self-improvement, surpassing manually crafted agents in performance, efficiency, and generalizability.
arXiv Detail & Related papers (2024-10-06T10:49:40Z)
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning. LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors. We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z)
Textualized Agent-Style Reasoning for Complex Tasks by Multiple Round LLM Generation [49.27250832754313]
We present AgentCOT, a llm-based autonomous agent framework. At each step, AgentCOT selects an action and executes it to yield an intermediate result with supporting evidence. We introduce two new strategies to enhance the performance of AgentCOT.
arXiv Detail & Related papers (2024-09-19T02:20:06Z)
Derailer-Rerailer: Adaptive Verification for Efficient and Reliable Language Model Reasoning [11.765298236504155]
Derailer-Rerailer is a novel framework that balances reasoning accuracy and computational efficiency.<n>Our framework achieves significant accuracy improvements (8-11% across various reasoning tasks) while maintaining 2-3 times better efficiency than existing verification methods.
arXiv Detail & Related papers (2024-08-25T21:20:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.