Rethinking the Role of Entropy in Optimizing Tool-Use Behaviors for Large Language Model Agents
- URL: http://arxiv.org/abs/2602.02050v1
- Date: Mon, 02 Feb 2026 12:52:14 GMT
- Title: Rethinking the Role of Entropy in Optimizing Tool-Use Behaviors for Large Language Model Agents
- Authors: Zeping Li, Hongru Wang, Yiwen Zhao, Guanhua Chen, Yixia Li, Keyang Chen, Yixin Cao, Guangnan Ye, Hongfeng Chai, Mengdi Wang, Zhenfei Yin,
- Abstract summary: Tool-using agents based on Large Language Models (LLMs) excel in tasks such as mathematical reasoning and multi-hop question answering.<n>In long trajectories, agents often trigger excessive and low-quality tool calls, increasing latency and degrading inference performance.<n>We propose using entropy reduction as a supervisory signal and design two reward strategies to address the differing needs of optimizing tool-use behavior.
- Score: 54.18201810286764
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tool-using agents based on Large Language Models (LLMs) excel in tasks such as mathematical reasoning and multi-hop question answering. However, in long trajectories, agents often trigger excessive and low-quality tool calls, increasing latency and degrading inference performance, making managing tool-use behavior challenging. In this work, we conduct entropy-based pilot experiments and observe a strong positive correlation between entropy reduction and high-quality tool calls. Building on this finding, we propose using entropy reduction as a supervisory signal and design two reward strategies to address the differing needs of optimizing tool-use behavior. Sparse outcome rewards provide coarse, trajectory-level guidance to improve efficiency, while dense process rewards offer fine-grained supervision to enhance performance. Experiments across diverse domains show that both reward designs improve tool-use behavior: the former reduces tool calls by 72.07% compared to the average of baselines, while the latter improves performance by 22.27%. These results position entropy reduction as a key mechanism for enhancing tool-use behavior, enabling agents to be more adaptive in real-world applications.
Related papers
- TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents [43.376952807616256]
Recent advances in autonomous LLM agents demonstrate their ability to improve performance through iterative interaction with the environment.<n>We propose Test-time Improvement Diagnostic Evaluation (TIDE), an agent-agnostic and environment-agnostic framework that decomposes TTI into three comprehensive and interconnected dimensions.
arXiv Detail & Related papers (2026-02-02T15:00:47Z) - Reasoning and Tool-use Compete in Agentic RL:From Quantifying Interference to Disentangled Tuning [26.401906729658688]
Agentic Reinforcement Learning (ARL) focuses on training large language models to interleave reasoning with external tool execution to solve complex tasks.<n>Most existing ARL methods train a single shared model parameters to support both reasoning and tool use behaviors, implicitly assuming that joint training leads to improved overall agent performance.<n>We show that these two capabilities often induce misaligned gradient directions, leading to training interference that undermines the effectiveness of joint optimization.<n>We propose Disentangled Action Reasoning Tuning(DART), a simple and efficient framework that explicitly decouples parameter updates for reasoning and tool-use via separate low-rank
arXiv Detail & Related papers (2026-02-01T03:19:22Z) - Optimizing Agentic Workflows using Meta-tools [3.3298825663516403]
Agentic AI enables LLM to dynamically reason, plan, and interact with tools to solve complex tasks.<n>This work introduces Agent Optimization (AWO), a framework that identifies and optimize redundant tool execution patterns.<n>AWO reduces the number of LLM calls up to 11.9% while also increasing the task success rate by up to 4.2 percent points.
arXiv Detail & Related papers (2026-01-29T17:43:08Z) - AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning [66.24374176797075]
We introduce textbfAdaReasoner, a family of multimodal models that learn tool use as a general reasoning skill rather than as tool-specific or explicitly supervised behavior.<n>AdaReasoner is enabled by (i) a scalable data curation pipeline exposing models to long-horizon, multi-step tool interactions; (ii) Tool-GRPO, a reinforcement learning algorithm that prioritizes tool selection and sequencing based on end-task success; and (iii) an adaptive learning mechanism that dynamically regulates tool usage.
arXiv Detail & Related papers (2026-01-26T16:04:43Z) - Tool-Augmented Policy Optimization: Synergizing Reasoning and Adaptive Tool Use with Reinforcement Learning [29.280386584974455]
Recent advances in large language models (LLMs) have popularized test-time scaling, where models generate additional reasoning tokens before producing final answers.<n>These approaches have demonstrated significant performance improvements on benchmarks involving mathematical reasoning.<n>We propose Tool-Augmented Policy Optimization (TAPO), a novel reinforcement learning framework that integrates multi-hop reasoning with adaptive tool-calling capabilities.
arXiv Detail & Related papers (2025-10-08T14:04:27Z) - Toward Effective Tool-Integrated Reasoning via Self-Evolved Preference Learning [68.89572566071575]
Tool-Integrated Reasoning (TIR) enables large language models (LLMs) to improve their internal reasoning ability by integrating external tools.<n>We propose Tool-Light, a framework designed to encourage LLMs to perform TIR efficiently and accurately.<n> Experimental results on 10 datasets demonstrate the effectiveness of Tool-Light.
arXiv Detail & Related papers (2025-09-27T12:53:37Z) - Agentic Reinforced Policy Optimization [66.96989268893932]
Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks.<n>Current RL algorithms inadequately balance the models' intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions.<n>We propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents.
arXiv Detail & Related papers (2025-07-26T07:53:11Z) - Acting Less is Reasoning More! Teaching Model to Act Efficiently [87.28134636548705]
Tool-integrated reasoning augments large language models with the ability to invoke external tools to solve tasks.<n>Current approaches typically optimize only for final correctness without considering the efficiency or necessity of external tool use.<n>We propose a framework that encourages models to produce accurate answers with minimal tool calls.<n>Our approach reduces tool calls by up to 68.3% and improves tool productivity by up to 215.4%, while maintaining comparable answer accuracy.
arXiv Detail & Related papers (2025-04-21T05:40:05Z) - On the Role of Feedback in Test-Time Scaling of Agentic AI Workflows [71.92083784393418]
Agentic AI (systems that autonomously plan and act) are becoming widespread, yet their task success rate on complex tasks remains low.<n>Inference-time alignment relies on three components: sampling, evaluation, and feedback.<n>We introduce Iterative Agent Decoding (IAD), a procedure that repeatedly inserts feedback extracted from different forms of critiques.
arXiv Detail & Related papers (2025-04-02T17:40:47Z) - Augmenting Unsupervised Reinforcement Learning with Self-Reference [63.68018737038331]
Humans possess the ability to draw on past experiences explicitly when learning new tasks.
We propose the Self-Reference (SR) approach, an add-on module explicitly designed to leverage historical information.
Our approach achieves state-of-the-art results in terms of Interquartile Mean (IQM) performance and Optimality Gap reduction on the Unsupervised Reinforcement Learning Benchmark.
arXiv Detail & Related papers (2023-11-16T09:07:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.