Fugu-MT 論文翻訳(概要): Don't Just Fine-tune the Agent, Tune the Environment

論文の概要: Don't Just Fine-tune the Agent, Tune the Environment

arxiv url: http://arxiv.org/abs/2510.10197v1
Date: Sat, 11 Oct 2025 12:35:15 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-14 18:06:29.838762
Title: Don't Just Fine-tune the Agent, Tune the Environment
Title（参考訳）: エージェントを微調整するな, 環境を微調整するな
Authors: Siyuan Lu, Zechuan Wang, Hongxuan Zhang, Qintong Wu, Leilei Gan, Chenyi Zhuang, Jinjie Gu, Tao Lin,
Abstract要約: 合成データの微調整の監督は、過度な適合につながる。標準的な強化学習は、重要なコールドスタート問題とトレーニング不安定性に苦慮している。本研究は,静的軌道の教師付き微調整から動的環境探索へのパラダイムシフトを示す。
参考スコア（独自算出の注目度）: 25.7349297100143
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Large Language Model (LLM) agents show great promise for complex, multi-turn tool-use tasks, but their development is often hampered by the extreme scarcity of high-quality training data. Supervised fine-tuning (SFT) on synthetic data leads to overfitting, whereas standard reinforcement learning (RL) struggles with a critical cold-start problem and training instability. To address these challenges, we introduce $\textbf{Environment Tuning}$, a novel training paradigm that enables agents to learn complex behaviors directly from problem instances without relying on pre-collected expert trajectories. $\textbf{Environment Tuning}$ orchestrates this learning process through a structured curriculum, actionable environment augmentation that provides corrective feedback, and fine-grained progress rewards to ensure stable and efficient exploration. Using only 400 problem instances from Berkeley Function-Calling Leaderboard (BFCL) benchmark, our method not only achieves competitive in-distribution performance against strong baselines but also demonstrates superior out-of-distribution generalization, overcoming the performance collapse common to SFT-based approaches. Our work presents a paradigm shift from supervised fine-tuning on static trajectories to dynamic, environment-based exploration, paving the way for training more robust and data-efficient agents.
Abstract（参考訳）: 大規模言語モデル(LLM)エージェントは、複雑なマルチターンツール使用タスクを大いに約束するが、その開発は高品質なトレーニングデータの極端な不足によって妨げられることが多い。合成データに対する改良された微調整(SFT)は過度に適合するが、標準強化学習(RL)は重要なコールドスタート問題とトレーニング不安定性に苦しむ。これらの課題に対処するために、エージェントが事前にコンパイルされた専門家の軌跡に頼ることなく、問題インスタンスから直接複雑な振る舞いを学習できる新しいトレーニングパラダイムである$\textbf{Environment Tuning}$を紹介します。 $\textbf{Environment Tuning}$ この学習プロセスを、構造化されたカリキュラム、修正的なフィードバックを提供する実行可能な環境拡張、そして安定した効率的な探索を保証するためのきめ細かい進歩報酬を通じて編成する。バークレー・ファンクション・カリング・リーダーボード (BFCL) ベンチマークから得られた400個の問題事例を用いて, 本手法は, 強力なベースラインに対する競合的分散性能を達成するだけでなく, SFTベースのアプローチに共通する性能崩壊を克服し, より優れたアウト・オブ・ディストリビューション一般化を示す。我々の研究は、静的軌道の教師付き微調整から動的で環境に基づく探索へのパラダイムシフトを示し、より堅牢でデータ効率のよいエージェントを訓練するための道を開く。

論文の概要: Don't Just Fine-tune the Agent, Tune the Environment

関連論文リスト