PEARL: Plan Exploration and Adaptive Reinforcement Learning for Multihop Tool Use
- URL: http://arxiv.org/abs/2601.20439v1
- Date: Wed, 28 Jan 2026 09:49:43 GMT
- Title: PEARL: Plan Exploration and Adaptive Reinforcement Learning for Multihop Tool Use
- Authors: Qihao Wang, Mingzhe Lu, Jiayue Wu, Yue Hu, Yanbing Liu,
- Abstract summary: Large Language Models show great potential with external tools, but face significant challenges in complex, multi-turn tool invocation.<n>We present PEARL, a novel framework to enhance LLM planning and execution for sophisticated tool use.<n> Experiments on the ToolHop and T-Eval benchmarks show PEARL significantly outperforms existing methods, achieving a new state-of-the-art success rate of textbf56.5% on ToolHop while maintaining a low invocation error rate.
- Score: 10.58323565531902
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models show great potential with external tools, but face significant challenges in complex, multi-turn tool invocation. They often exhibit weak planning, tool hallucination, erroneous parameter generation, and struggle with robust interaction. To tackle these issues, we present PEARL, a novel framework to enhance LLM planning and execution for sophisticated tool use. PEARL adopts a two-stage approach: an offline phase where the agent explores tools to learn valid usage patterns and failure conditions, and an online reinforcement learning phase. In the online phase, a dedicated Planner is trained via group Relative Policy Optimization (GRPO) with a carefully designed reward function that provides distinct signals for planning quality. Experiments on the ToolHop and T-Eval benchmarks show PEARL significantly outperforms existing methods, achieving a new state-of-the-art success rate of \textbf{56.5\%} on ToolHop while maintaining a low invocation error rate. Our work marks a key advance in addressing the complex planning challenges of tool use, contributing to the development of more robust and reliable LLM-based agents.
Related papers
- AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning [66.24374176797075]
We introduce textbfAdaReasoner, a family of multimodal models that learn tool use as a general reasoning skill rather than as tool-specific or explicitly supervised behavior.<n>AdaReasoner is enabled by (i) a scalable data curation pipeline exposing models to long-horizon, multi-step tool interactions; (ii) Tool-GRPO, a reinforcement learning algorithm that prioritizes tool selection and sequencing based on end-task success; and (iii) an adaptive learning mechanism that dynamically regulates tool usage.
arXiv Detail & Related papers (2026-01-26T16:04:43Z) - Teaching LLMs to Learn Tool Trialing and Execution through Environment Interaction [31.689383152872534]
ToolMaster is a framework that shifts tool use from imitating golden tool-calling trajectories to actively learning tool usage through interaction with the environment.<n>To optimize LLMs for tool planning and invocation, ToolMaster adopts a trial-and-execution paradigm.<n> Experimental results demonstrate that ToolMaster significantly outperforms existing baselines in terms of generalization and robustness across unseen or unfamiliar tools.
arXiv Detail & Related papers (2026-01-19T06:46:33Z) - ML-Tool-Bench: Tool-Augmented Planning for ML Tasks [23.54937738755734]
We introduce a benchmark for evaluating tool-augmented machine learning agents.<n>Our benchmark goes beyond traditional tool-use evaluation by incorporating an in-memory named object management.<n>Our approach improves over ReAct by 16.52 percentile positions, taking the median across all Kaggle challenges.
arXiv Detail & Related papers (2025-11-29T23:59:40Z) - Improving Large Language Models Function Calling and Interpretability via Guided-Structured Templates [56.73907811047611]
Large language models (LLMs) have demonstrated strong reasoning and tool-use capabilities.<n>LLMs often fail in real-world tool-interactions due to incorrect parameterization, poor tool selection, or misinterpretation of user intent.<n>We introduce a curriculum-inspired framework that leverages structured reasoning templates to guide LLMs through more deliberate step-by-step instructions for generating function callings.
arXiv Detail & Related papers (2025-09-22T17:55:14Z) - Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning [63.31585771716123]
Large language models (LLMs) have shown remarkable reasoning capabilities via large-scale reinforcement learning (RL)<n>We introduce Tool-Star, an RL-based framework designed to empower LLMs to autonomously invoke multiple external tools during stepwise reasoning.<n>Tool-Star integrates six types of tools and incorporates systematic designs in both data synthesis and training.
arXiv Detail & Related papers (2025-05-22T09:00:19Z) - FamilyTool: A Multi-hop Personalized Tool Use Benchmark [93.80355496575281]
FamilyTool is a benchmark grounded in a family-based knowledge graph (KG) that simulates personalized, multi-hop tool use scenarios.<n> Experiments reveal significant performance gaps in state-of-the-art Large Language Models (LLMs)<n>FamilyTool serves as a critical resource for evaluating and advancing LLM agents' reasoning, adaptability, and scalability in complex, dynamic environments.
arXiv Detail & Related papers (2025-04-09T10:42:36Z) - Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger [49.81945268343162]
We propose MeCo, an adaptive decision-making strategy for external tool use.<n>MeCo quantifies metacognitive scores by capturing high-level cognitive signals in the representation space.<n>MeCo is fine-tuning-free and incurs minimal cost.
arXiv Detail & Related papers (2025-02-18T15:45:01Z) - Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios [93.68764280953624]
UltraTool is a novel benchmark designed to improve and evaluate Large Language Models' ability in tool utilization.
It emphasizes real-world complexities, demanding accurate, multi-step planning for effective problem-solving.
A key feature of UltraTool is its independent evaluation of planning with natural language, which happens before tool usage.
arXiv Detail & Related papers (2024-01-30T16:52:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.