Fugu-MT 論文翻訳(概要): DEPO: Dual-Efficiency Preference Optimization for LLM Agents

論文の概要: DEPO: Dual-Efficiency Preference Optimization for LLM Agents

arxiv url: http://arxiv.org/abs/2511.15392v1
Date: Wed, 19 Nov 2025 12:38:43 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-20 15:51:28.802551
Title: DEPO: Dual-Efficiency Preference Optimization for LLM Agents
Title（参考訳）: DEPO:LLMエージェントのデュアル効率優先最適化
Authors: Sirui Chen, Mengshi Zhao, Lei Xu, Yuying Zhao, Beier Zhu, Hanwang Zhang, Shengjie Zhao, Chaochao Lu,
Abstract要約: 本稿では、簡潔な応答とアクションステップの低減を両立させる二重効率優先最適化手法DEPOを提案する。 WebShopとBabyAIの実験によると、DECOはトークンの使用量を最大60.9%削減し、ステップを最大26.9%削減し、パフォーマンスは最大29.3%向上した。
参考スコア（独自算出の注目度）: 75.6723341304463
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in large language models (LLMs) have greatly improved their reasoning and decision-making abilities when deployed as agents. Richer reasoning, however, often comes at the cost of longer chain of thought (CoT), hampering interaction efficiency in real-world scenarios. Nevertheless, there still lacks systematic definition of LLM agent efficiency, hindering targeted improvements. To this end, we introduce dual-efficiency, comprising (i) step-level efficiency, which minimizes tokens per step, and (ii) trajectory-level efficiency, which minimizes the number of steps to complete a task. Building on this definition, we propose DEPO, a dual-efficiency preference optimization method that jointly rewards succinct responses and fewer action steps. Experiments on WebShop and BabyAI show that DEPO cuts token usage by up to 60.9% and steps by up to 26.9%, while achieving up to a 29.3% improvement in performance. DEPO also generalizes to three out-of-domain math benchmarks and retains its efficiency gains when trained on only 25% of the data. Our project page is at https://opencausalab.github.io/DEPO.
Abstract（参考訳）: 大規模言語モデル(LLM)の最近の進歩は、エージェントとしてデプロイされる際の推論と意思決定能力を大幅に改善している。しかし、よりリッチな推論は、しばしばより長い思考連鎖(CoT)のコストを伴い、現実世界のシナリオにおける相互作用効率を阻害する。それでも、LLMエージェントの効率の体系的な定義が欠けており、目標とする改善を妨げている。この目的のために、我々は二重効率を導入し、構成する。 (i)ステップ単位のトークンを最小限にするステップレベルの効率、及び (ii) 軌道レベルの効率は、タスクを完了するためのステップの数を最小限にする。この定義に基づいて、簡潔な応答とより少ないアクションステップを共同で報酬する二重効率優先最適化手法であるDEPOを提案する。 WebShopとBabyAIの実験によると、DECOはトークンの使用量を最大60.9%削減し、ステップを最大26.9%削減し、パフォーマンスは最大29.3%向上した。 DEPOはまた、領域外ベンチマークを3つに一般化し、25%のデータでトレーニングした場合の効率向上を維持している。プロジェクトページはhttps://opencausalab.github.io/DEPO.orgにある。

論文の概要: DEPO: Dual-Efficiency Preference Optimization for LLM Agents

関連論文リスト