Related papers: One Tool Is Enough: Reinforcement Learning for Repository-Level LLM Agents

One Tool Is Enough: Reinforcement Learning for Repository-Level LLM Agents

URL: http://arxiv.org/abs/2512.20957v2
Date: Thu, 25 Dec 2025 05:33:05 GMT
Title: One Tool Is Enough: Reinforcement Learning for Repository-Level LLM Agents
Authors: Zhaoxi Zhang, Yitong Duan, Yanzhi Zhang, Yiming Xu, Jiyan He, Yunfang Wu,
Abstract summary: RepoNavigator is an agent equipped with a single execution-aware tool-jumping to the definition of an invoked symbol.<n>RepoNavigator is trained end-to-end via Reinforcement Learning directly from a pretrained model, without any closed-source distillation.
Score: 16.281864564259827
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Locating the files and functions requiring modification in large open-source software (OSS) repositories is challenging due to their scale and structural complexity. Existing large language model (LLM)-based methods typically treat this as a repository-level retrieval task and rely on multiple auxiliary tools, which overlook code execution logic and complicate model control. We propose RepoNavigator, an LLM agent equipped with a single execution-aware tool-jumping to the definition of an invoked symbol. This unified design reflects the actual flow of code execution while simplifying tool manipulation. RepoNavigator is trained end-to-end via Reinforcement Learning (RL) directly from a pretrained model, without any closed-source distillation. Experiments demonstrate that RL-trained RepoNavigator achieves state-of-the-art performance, with the 7B model outperforming 14B baselines, the 14B model surpassing 32B competitors, and even the 32B model exceeding closed-source models such as Claude-3.7. These results confirm that integrating a single, structurally grounded tool with RL training provides an efficient and scalable solution for repository-level issue localization.

Related papers

Pull Requests as a Training Signal for Repo-Level Code Editing [49.82435173554125]
Clean Pull Request (Clean-PR) is a mid-training paradigm that leverages real-world GitHub pull requests as a training signal for repository-level editing.<n>We introduce a scalable pipeline that converts noisy pull request diffs into Search/Replace edit blocks through reconstruction and validation.<n>On SWE-bench, our model significantly outperforms the instruction-tuned baseline, achieving absolute improvements of 13.6% on SWE-bench Lite and 12.3% on SWE-bench Verified.
arXiv Detail & Related papers (2026-02-07T09:22:25Z)
D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use [17.99381644283042]
Large reasoning models (LRMs) lack the capability of sub-task decomposition in complex tool use scenarios, leading to Lazy Reasoning.<n>We propose a two-stage training framework that incentivizes LRMs' task decomposition reasoning capability via self-distillation and diversity-aware reinforcement learning.<n>D-CORE achieves robust tool-use improvements across diverse benchmarks and model scales.
arXiv Detail & Related papers (2026-02-02T14:36:15Z)
LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls [46.34510189812439]
LoopTool is a fully automated, model-aware data evolution framework.<n>It iteratively refines both the data and the model through three synergistic modules.<n> Experiments show that our 8B model trained with LoopTool significantly surpasses its 32B data generator.
arXiv Detail & Related papers (2025-11-12T09:34:39Z)
Tool Zero: Training Tool-Augmented LLMs via Pure RL from Scratch [63.40752011615843]
Training tool-augmented language models has emerged as a promising approach to enhancing their capabilities for complex tasks.<n>We propose a dynamic generalization-guided reward design for rule-based reinforcement learning.<n>We show that our models achieve over 7% performance improvement compared to both SFT and RL-with-SFT models.
arXiv Detail & Related papers (2025-11-02T16:33:45Z)
Reinforcement Learning for Machine Learning Engineering Agents [52.03168614623642]
We show that agents backed by weaker models that improve via reinforcement learning can outperform agents backed by much larger, but static models.<n>We propose duration- aware gradient updates in a distributed asynchronous RL framework to amplify high-cost but high-reward actions.<n>We also propose environment instrumentation to offer partial credit, distinguishing almost-correct programs from those that fail early.
arXiv Detail & Related papers (2025-09-01T18:04:10Z)
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use [78.29315418819074]
We introduce VerlTool, a unified and modular framework that addresses limitations through systematic design principles.<n>Our framework formalizes ARLT as multi-turn trajectories with multi-modal observation tokens (text/image/video), extending beyond single-turn RLVR paradigms.<n>The modular plugin architecture enables rapid tool integration requiring only lightweight Python definitions.
arXiv Detail & Related papers (2025-09-01T01:45:18Z)
RLFactory: A Plug-and-Play Reinforcement Learning Post-Training Framework for LLM Multi-Turn Tool-Use [50.52940111891476]
Large language models excel at basic reasoning but struggle with tasks that require interaction with external tools.<n>We present RLFactory, a plug-and-play reinforcement learning framework for multi-round tool use.
arXiv Detail & Related papers (2025-08-31T16:47:31Z)
On the Impacts of Contexts on Repository-Level Code Generation [5.641402231731082]
We present RepoExec, a novel benchmark designed to evaluate repository-level code generation.<n>We focus on three key aspects: executability, functional correctness through comprehensive test case generation, and accurate utilization of cross-file contexts.
arXiv Detail & Related papers (2024-06-17T10:45:22Z)
CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets [75.64181719386497]
We present CRAFT, a tool creation and retrieval framework for large language models (LLMs) It creates toolsets specifically curated for the tasks and equips LLMs with a component that retrieves tools from these sets to enhance their capability to solve complex tasks. Our method is designed to be flexible and offers a plug-and-play approach to adapt off-the-shelf LLMs to unseen domains and modalities, without any finetuning.
arXiv Detail & Related papers (2023-09-29T17:40:26Z)
Integrating Distributed Architectures in Highly Modular RL Libraries [4.297070083645049]
Most popular reinforcement learning libraries advocate for highly modular agent composability. We propose a versatile approach that allows the definition of RL agents at different scales through independent reusable components.
arXiv Detail & Related papers (2020-07-06T10:22:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.