Related papers: ToRL: Scaling Tool-Integrated RL

ToRL: Scaling Tool-Integrated RL

URL: http://arxiv.org/abs/2503.23383v1
Date: Sun, 30 Mar 2025 10:16:25 GMT
Title: ToRL: Scaling Tool-Integrated RL
Authors: Xuefeng Li, Haoyang Zou, Pengfei Liu,
Abstract summary: ToRL is a framework for training large language models to autonomously use computational tools.<n>ToRL allows models to explore and discover optimal strategies for tool use.<n>Experiments with Qwen2.5-Math models show significant improvements.
Score: 25.477841726836836
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce ToRL (Tool-Integrated Reinforcement Learning), a framework for training large language models (LLMs) to autonomously use computational tools via reinforcement learning. Unlike supervised fine-tuning, ToRL allows models to explore and discover optimal strategies for tool use. Experiments with Qwen2.5-Math models show significant improvements: ToRL-7B reaches 43.3\% accuracy on AIME~24, surpassing reinforcement learning without tool integration by 14\% and the best existing Tool-Integrated Reasoning (TIR) model by 17\%. Further analysis reveals emergent behaviors such as strategic tool invocation, self-regulation of ineffective code, and dynamic adaptation between computational and analytical reasoning, all arising purely through reward-driven learning.

Related papers

AutoTIR: Autonomous Tools Integrated Reasoning via Reinforcement Learning [17.086082843274003]
Large Language Models (LLMs) evolve into powerful Large Reasoning Models (LRMs)<n>Tool-Integrated Reasoning (TIR) further extends their capabilities by incorporating external tools.<n>Inspired by the human ability to adaptively select tools, we introduce AutoTIR, a reinforcement learning framework.
arXiv Detail & Related papers (2025-07-29T14:12:28Z)
Towards Effective Code-Integrated Reasoning [89.47213509714578]
We investigate code-integrated reasoning, where models generate code when necessary and integrate feedback by executing it through a code interpreter.<n>Tool-augmented reinforcement learning can still suffer from potential instability in the learning dynamics.<n>We develop enhanced training strategies that balance exploration and stability, progressively building tool-use capabilities while improving reasoning performance.
arXiv Detail & Related papers (2025-05-30T11:30:18Z)
Nemotron-Research-Tool-N1: Exploring Tool-Using Language Models with Reinforced Reasoning [93.30252692375886]
Rule-based reinforcement learning can be used to enhance tool-calling in large language models.<n>Tool-N1-7B/14B clearly outperform GPT-4o on several major benchmarks.
arXiv Detail & Related papers (2025-04-25T02:55:21Z)
OTC: Optimal Tool Calls via Reinforcement Learning [87.28134636548705]
We propose a tool-integrated reward that jointly considers correctness and tool efficiency, promoting high tool productivity. Our approach reduces tool calls by up to 73.1% and improves tool productivity by up to 229.4%, while maintaining comparable answer accuracy.
arXiv Detail & Related papers (2025-04-21T05:40:05Z)
ToolRL: Reward is All Tool Learning Needs [54.16305891389931]
Large Language Models (LLMs) often undergo supervised fine-tuning (SFT) to acquire tool use capabilities. Recent advancements in reinforcement learning (RL) have demonstrated promising reasoning and generalization abilities. We present the first comprehensive study on reward design for tool selection and application tasks within the RL paradigm.
arXiv Detail & Related papers (2025-04-16T21:45:32Z)
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs [27.07998056454784]
ReTool enhances long-form reasoning with tool-integrated learning. Model achieves 67% accuracy with 400 training steps. Remarkably, ReTool-32B attains 72.5% accuracy in extended settings.
arXiv Detail & Related papers (2025-04-15T18:10:22Z)
ToolACE-R: Tool Learning with Adaptive Self-Refinement [84.69651852838794]
Tool learning allows Large Language Models to leverage external tools for solving complex user tasks. We propose ToolACE-R, a novel method that introduces adaptive self-refinement for tool invocations. Our results demonstrate the effectiveness of the proposed method, which is compatible with base models of various sizes.
arXiv Detail & Related papers (2025-04-02T06:38:56Z)
An Empirical Study on Eliciting and Improving R1-like Reasoning Models [90.52239241349504]
scaling RL training has become a central technique for implementing such reasoning models.<n>We demonstrate that our RL training approach consistently improves the Qwen2.5-32B base models.<n>We also explore the use of tool manipulation, finding that it significantly boosts the reasoning performance of large reasoning models.
arXiv Detail & Related papers (2025-03-06T15:34:27Z)
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning [61.10299147201369]
This paper introduces a novel autonomous RL approach, called DigiRL, for training in-the-wild device control agents. We build a scalable and parallelizable Android learning environment equipped with a VLM-based evaluator. We demonstrate the effectiveness of DigiRL using the Android-in-the-Wild dataset, where our 1.3B VLM trained with RL achieves a 49.5% absolute improvement.
arXiv Detail & Related papers (2024-06-14T17:49:55Z)
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving [170.7899683843177]
ToRA is a series of Tool-integrated Reasoning Agents designed to solve challenging mathematical problems. ToRA models significantly outperform open-source models on 10 mathematical reasoning datasets across all scales. ToRA-Code-34B is the first open-source model that achieves an accuracy exceeding 50% on MATH.
arXiv Detail & Related papers (2023-09-29T17:59:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.