DEPO: Dual-Efficiency Preference Optimization for LLM Agents
- URL: http://arxiv.org/abs/2511.15392v1
- Date: Wed, 19 Nov 2025 12:38:43 GMT
- Title: DEPO: Dual-Efficiency Preference Optimization for LLM Agents
- Authors: Sirui Chen, Mengshi Zhao, Lei Xu, Yuying Zhao, Beier Zhu, Hanwang Zhang, Shengjie Zhao, Chaochao Lu,
- Abstract summary: We propose DEPO, a dual-efficiency preference optimization method that jointly rewards succinct responses and fewer action steps.<n>Experiments on WebShop and BabyAI show that DEPO cuts token usage by up to 60.9% and steps by up to 26.9%, while achieving up to a 29.3% improvement in performance.
- Score: 75.6723341304463
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in large language models (LLMs) have greatly improved their reasoning and decision-making abilities when deployed as agents. Richer reasoning, however, often comes at the cost of longer chain of thought (CoT), hampering interaction efficiency in real-world scenarios. Nevertheless, there still lacks systematic definition of LLM agent efficiency, hindering targeted improvements. To this end, we introduce dual-efficiency, comprising (i) step-level efficiency, which minimizes tokens per step, and (ii) trajectory-level efficiency, which minimizes the number of steps to complete a task. Building on this definition, we propose DEPO, a dual-efficiency preference optimization method that jointly rewards succinct responses and fewer action steps. Experiments on WebShop and BabyAI show that DEPO cuts token usage by up to 60.9% and steps by up to 26.9%, while achieving up to a 29.3% improvement in performance. DEPO also generalizes to three out-of-domain math benchmarks and retains its efficiency gains when trained on only 25% of the data. Our project page is at https://opencausalab.github.io/DEPO.
Related papers
- WebLeaper: Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking [60.35109192765302]
Information seeking is a core capability that enables autonomous reasoning and decision-making.<n>We propose WebLeaper, a framework for constructing high-coverage IS tasks and generating efficient solution trajectories.<n>Our method consistently achieves improvements in both effectiveness and efficiency over strong baselines.
arXiv Detail & Related papers (2025-10-28T17:51:42Z) - Efficient Agents: Building Effective Agents While Reducing Cost [48.65558640786415]
Large Language Model (LLM)-driven agents have enabled sophisticated systems to tackle complex, multi-step tasks.<n>This work presents the first systematic study of the efficiency-effectiveness trade-off in modern agent systems.
arXiv Detail & Related papers (2025-07-24T17:56:51Z) - Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization [46.33639431414019]
Large Language Models generate functionally correct solutions but often fall short in code efficiency.<n>We introduce a novel test-time iterative optimization framework to address this.
arXiv Detail & Related papers (2025-05-29T12:14:29Z) - SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models [17.483183039447564]
This paper introduces Sparse Expert Activation Pruning (SEAP), a training-free pruning method that selectively retains task-relevant parameters to reduce inference overhead.<n> Experimental results demonstrate that SEAP significantly reduces computational overhead while maintaining competitive accuracy.
arXiv Detail & Related papers (2025-03-10T17:59:03Z) - Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System [75.25394449773052]
Large Language Model (LLM) based multi-agent systems (MAS) show remarkable potential in collaborative problem-solving.<n>Yet they still face critical challenges: low communication efficiency, poor scalability, and a lack of effective parameter-updating optimization methods.<n>We present Optima, a novel framework that addresses these issues by significantly enhancing both communication efficiency and task effectiveness.
arXiv Detail & Related papers (2024-10-10T17:00:06Z) - Rational Metareasoning for Large Language Models [17.479428400594028]
Being prompted to engage in reasoning has emerged as a core technique for using large language models (LLMs)<n>This work introduces a novel approach based on computational models of metareasoning used in cognitive science.<n>We develop a reward function that incorporates the Value of Computation by penalizing unnecessary reasoning.
arXiv Detail & Related papers (2024-10-07T23:48:52Z) - Federated Learning of Large Language Models with Parameter-Efficient
Prompt Tuning and Adaptive Optimization [71.87335804334616]
Federated learning (FL) is a promising paradigm to enable collaborative model training with decentralized data.
The training process of Large Language Models (LLMs) generally incurs the update of significant parameters.
This paper proposes an efficient partial prompt tuning approach to improve performance and efficiency simultaneously.
arXiv Detail & Related papers (2023-10-23T16:37:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.