Related papers: Can Tool-Integrated Reinforcement Learning Generalize Across Diverse Domains?

Can Tool-Integrated Reinforcement Learning Generalize Across Diverse Domains?

URL: http://arxiv.org/abs/2510.11184v1
Date: Mon, 13 Oct 2025 09:19:13 GMT
Title: Can Tool-Integrated Reinforcement Learning Generalize Across Diverse Domains?
Authors: Zhengyu Chen, Jinluan Yang, Teng Xiao, Ruochen Zhou, Luan Zhang, Xiangyu Xi, Xiaowei Shi, Wei Wang, Jinggang Wang,
Abstract summary: Generalization of tool-augmented reinforcement learning across diverse domains remains underexplored.<n>We propose a Tool Generalization Reinforcement Learning framework designed to promote domain-agnostic learning and skill migration.
Score: 18.11059968099671
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in large language models (LLMs) have demonstrated remarkable capabilities in reasoning and tool utilization. However, the generalization of tool-augmented reinforcement learning (RL) across diverse domains remains underexplored. In this work, we investigate the cross-domain generalization of an LLM agent equipped with a code interpreter tool, which is exclusively trained on mathematical problem-solving tasks. Despite the restricted training domain, we evaluate the agent's performance across several distinct reasoning domains. The results reveal that RL-based tool usage learned from mathematical tasks can be effectively transferred to complex tasks in other domains, enabling great task performance and high token efficiency. To facilitate this cross-domain transfer, we propose a Tool Generalization Reinforcement Learning (TGRL) framework designed to promote domain-agnostic learning and skill migration, encompassing: (i) a standardized tool interface that abstracts domain-specific nuances through consistent formatting and explicit termination, fostering transferable invocation patterns; (ii) a dual-component reward system that decomposes rewards to incentivize generalizable behaviors like tool efficiency and reasoning abstraction, ensuring alignment and robustness across domain shifts; and (iii) an XML-based prompt template that separates thinking, tool calls, and responses to encourage modular, domain-invariant planning and coherent multi-turn interactions. Extensive experiments across diverse benchmarks validate our approach, achieving state-of-the-art performance and highlighting the cross-domain potential of Tool RL for LLM reasoning.

Related papers

Reasoning and Tool-use Compete in Agentic RL:From Quantifying Interference to Disentangled Tuning [26.401906729658688]
Agentic Reinforcement Learning (ARL) focuses on training large language models to interleave reasoning with external tool execution to solve complex tasks.<n>Most existing ARL methods train a single shared model parameters to support both reasoning and tool use behaviors, implicitly assuming that joint training leads to improved overall agent performance.<n>We show that these two capabilities often induce misaligned gradient directions, leading to training interference that undermines the effectiveness of joint optimization.<n>We propose Disentangled Action Reasoning Tuning(DART), a simple and efficient framework that explicitly decouples parameter updates for reasoning and tool-use via separate low-rank
arXiv Detail & Related papers (2026-02-01T03:19:22Z)
A Unified Multi-Task Learning Framework for Generative Auto-Bidding with Validation-Aligned Optimization [51.27959658504722]
Multi-task learning offers a principled framework to train these tasks jointly through shared representations.<n>Existing multi-task optimization strategies are primarily guided by training dynamics and often generalize poorly in volatile bidding environments.<n>We present Validation-Aligned Multi-task Optimization (VAMO), which adaptively assigns task weights based on the alignment between per-task training gradients and a held-out validation gradient.
arXiv Detail & Related papers (2025-10-09T03:59:51Z)
Multi-Agent Tool-Integrated Policy Optimization [67.12841355267678]
Large language models (LLMs) increasingly rely on multi-turn tool-integrated planning for knowledge-intensive and complex reasoning tasks.<n>Existing implementations typically rely on a single agent, but they suffer from limited context length and noisy tool responses.<n>No existing methods support effective reinforcement learning post-training of tool-integrated multi-agent frameworks.
arXiv Detail & Related papers (2025-10-06T10:44:04Z)
MR-UIE: Multi-Perspective Reasoning with Reinforcement Learning for Universal Information Extraction [21.487874020516454]
Large language models (LLMs) demonstrate robust capabilities across diverse research domains.<n>Existing approaches enhance the performance of LLMs through in-context learning and instruction tuning.<n>We propose integrating reinforcement learning (RL) with multi-perspective reasoning for information extraction tasks.
arXiv Detail & Related papers (2025-09-11T01:08:58Z)
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use [78.29315418819074]
We introduce VerlTool, a unified and modular framework that addresses limitations through systematic design principles.<n>Our framework formalizes ARLT as multi-turn trajectories with multi-modal observation tokens (text/image/video), extending beyond single-turn RLVR paradigms.<n>The modular plugin architecture enables rapid tool integration requiring only lightweight Python definitions.
arXiv Detail & Related papers (2025-09-01T01:45:18Z)
Agentic Reinforced Policy Optimization [66.96989268893932]
Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks.<n>Current RL algorithms inadequately balance the models' intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions.<n>We propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents.
arXiv Detail & Related papers (2025-07-26T07:53:11Z)
OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation [65.15955645757705]
We introduce Workforce, a hierarchical multi-agent framework that decouples strategic planning from specialized execution.<n>During inference, Workforce seamlessly adapts to new domains by adding or modifying worker agents.<n>For training, we introduce optimized Workforce Learning (OWL), which improves generalization across domains.
arXiv Detail & Related papers (2025-05-29T17:51:58Z)
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning [0.21845291030915975]
ARTIST is a unified framework that tightly couples agentic reasoning, reinforcement learning, and tool integration for large language models.<n>It enables models to autonomously decide when, how, and which tools to invoke within multi-turn reasoning chains.<n>Experiments show that ARTIST consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-28T10:42:49Z)
Guiding Multi-agent Multi-task Reinforcement Learning by a Hierarchical Framework with Logical Reward Shaping [16.5526277899717]
This study aims to design a multi-agent cooperative algorithm with logic reward shaping. Experiments have been conducted on various types of tasks in the Minecraft-like environment.
arXiv Detail & Related papers (2024-11-02T09:03:23Z)
R-Eval: A Unified Toolkit for Evaluating Domain Knowledge of Retrieval Augmented Large Language Models [51.468732121824125]
Large language models have achieved remarkable success on general NLP tasks, but they may fall short for domain-specific problems. Existing evaluation tools only provide a few baselines and evaluate them on various domains without mining the depth of domain knowledge. In this paper, we address the challenges of evaluating RALLMs by introducing the R-Eval toolkit, a Python toolkit designed to streamline the evaluation of different RAGs.
arXiv Detail & Related papers (2024-06-17T15:59:49Z)
Learning to Use Tools via Cooperative and Interactive Agents [58.77710337157665]
Tool learning empowers large language models (LLMs) as agents to use external tools and extend their utility. We propose ConAgents, a Cooperative and interactive Agents framework, which coordinates three specialized agents for tool selection, tool execution, and action calibration separately. Our experiments on three datasets show that the LLMs, when equipped with ConAgents, outperform baselines with substantial improvement.
arXiv Detail & Related papers (2024-03-05T15:08:16Z)
Exploiting Style Transfer-based Task Augmentation for Cross-Domain Few-Shot Learning [4.678020383205135]
In cross-domain few-shot learning, the model trained on source domains struggles to generalize to the target domain. We propose Task Augmented Meta-Learning (TAML) to conduct style transfer-based task augmentation. The proposed TAML increases the diversity of styles of training tasks, and contributes to training a model with better domain generalization ability.
arXiv Detail & Related papers (2023-01-19T07:32:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.