VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use
- URL: http://arxiv.org/abs/2509.01055v1
- Date: Mon, 01 Sep 2025 01:45:18 GMT
- Title: VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use
- Authors: Dongfu Jiang, Yi Lu, Zhuofeng Li, Zhiheng Lyu, Ping Nie, Haozhe Wang, Alex Su, Hui Chen, Kai Zou, Chao Du, Tianyu Pang, Wenhu Chen,
- Abstract summary: We introduce VerlTool, a unified and modular framework that addresses limitations through systematic design principles.<n>Our framework formalizes ARLT as multi-turn trajectories with multi-modal observation tokens (text/image/video), extending beyond single-turn RLVR paradigms.<n>The modular plugin architecture enables rapid tool integration requiring only lightweight Python definitions.
- Score: 78.29315418819074
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has demonstrated success in enhancing LLM reasoning capabilities, but remains limited to single-turn interactions without tool integration. While recent Agentic Reinforcement Learning with Tool use (ARLT) approaches have emerged to address multi-turn tool interactions, existing works develop task-specific codebases that suffer from fragmentation, synchronous execution bottlenecks, and limited extensibility across domains. These inefficiencies hinder broader community adoption and algorithmic innovation. We introduce VerlTool, a unified and modular framework that addresses these limitations through systematic design principles. VerlTool provides four key contributions: (1) upstream alignment with VeRL ensuring compatibility and simplified maintenance, (2) unified tool management via standardized APIs supporting diverse modalities including code execution, search, SQL databases, and vision processing, (3) asynchronous rollout execution achieving near 2$\times$ speedup by eliminating synchronization bottlenecks, and (4) comprehensive evaluation demonstrating competitive performance across 6 ARLT domains. Our framework formalizes ARLT as multi-turn trajectories with multi-modal observation tokens (text/image/video), extending beyond single-turn RLVR paradigms. We train and evaluate models on mathematical reasoning, knowledge QA, SQL generation, visual reasoning, web search, and software engineering tasks, achieving results comparable to specialized systems while providing unified training infrastructure. The modular plugin architecture enables rapid tool integration requiring only lightweight Python definitions, significantly reducing development overhead and providing a scalable foundation for tool-augmented RL research. Our code is open-sourced at https://github.com/TIGER-AI-Lab/verl-tool.
Related papers
- SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL [33.692408134748696]
Vision Language Models (VLMs) demonstrate strong qualitative visual understanding, but struggle with metrically precise spatial reasoning.<n>We introduce Double Interactive Reinforcement Learning (DIRL), a two-phase training framework where VLMs learn to coordinate multiple tools.<n>Our model, SpaceTools, with tool-augmented spatial reasoning ability, achieves state-of-the-art performance on spatial understanding benchmarks.
arXiv Detail & Related papers (2025-12-03T18:50:04Z) - Tool-R1: Sample-Efficient Reinforcement Learning for Agentic Tool Use [50.02614257515131]
Large language models (LLMs) have demonstrated strong capabilities in language understanding and reasoning.<n>We propose Tool-R1, a reinforcement learning framework that enables LLMs to perform general, compositional, and multi-step tool use.
arXiv Detail & Related papers (2025-09-16T09:22:21Z) - AgentFly: Extensible and Scalable Reinforcement Learning for LM Agents [25.735754822676277]
Language model (LM) agents have gained significant attention for their ability to autonomously complete tasks.<n> reinforcement learning (RL) has been explored to enhance LM's capabilities, such as reasoning and factuality.<n>We built AgentFly, a scalable and Agent-RL framework designed to empower LM agents with a variety of RL algorithms.
arXiv Detail & Related papers (2025-07-20T10:22:36Z) - Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning [63.31585771716123]
Large language models (LLMs) have shown remarkable reasoning capabilities via large-scale reinforcement learning (RL)<n>We introduce Tool-Star, an RL-based framework designed to empower LLMs to autonomously invoke multiple external tools during stepwise reasoning.<n>Tool-Star integrates six types of tools and incorporates systematic designs in both data synthesis and training.
arXiv Detail & Related papers (2025-05-22T09:00:19Z) - OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning [57.89304342666846]
We introduce OpenThinkIMG, the first open-source, comprehensive end-to-end framework for tool-augmented LVLMs.<n>We propose a novel reinforcement learning framework V-ToolRL to train LVLMs to learn adaptive policies for invoking external vision tools.<n>V-ToolRL enables LVLMs to autonomously discover optimal tool-usage strategies.
arXiv Detail & Related papers (2025-05-13T14:35:51Z) - Towards Completeness-Oriented Tool Retrieval for Large Language Models [60.733557487886635]
Real-world systems often incorporate a wide array of tools, making it impractical to input all tools into Large Language Models.
Existing tool retrieval methods primarily focus on semantic matching between user queries and tool descriptions.
We propose a novel modelagnostic COllaborative Learning-based Tool Retrieval approach, COLT, which captures not only the semantic similarities between user queries and tool descriptions but also takes into account the collaborative information of tools.
arXiv Detail & Related papers (2024-05-25T06:41:23Z) - CRAFT: Customizing LLMs by Creating and Retrieving from Specialized
Toolsets [75.64181719386497]
We present CRAFT, a tool creation and retrieval framework for large language models (LLMs)
It creates toolsets specifically curated for the tasks and equips LLMs with a component that retrieves tools from these sets to enhance their capability to solve complex tasks.
Our method is designed to be flexible and offers a plug-and-play approach to adapt off-the-shelf LLMs to unseen domains and modalities, without any finetuning.
arXiv Detail & Related papers (2023-09-29T17:40:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.