Fugu-MT 論文翻訳(概要): From Failure to Mastery: Generating Hard Samples for Tool-use Agents

論文の概要: From Failure to Mastery: Generating Hard Samples for Tool-use Agents

arxiv url: http://arxiv.org/abs/2601.01498v1
Date: Sun, 04 Jan 2026 11:56:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-06 16:25:22.448173
Title: From Failure to Mastery: Generating Hard Samples for Tool-use Agents
Title（参考訳）: 失敗から熟達へ - ツール使用エージェントのためのハードサンプルの生成
Authors: Bingguang Hao, Zengzhuang Xu, Yuntao Wen, Xinyi Xu, Yang Liu, Tong Zhao, Maolin Wang, Long Chen, Dong Wang, Yicheng Chen, Cunyin Peng, Xiangyu Zhao, Chenyi Zhuang, Ji Zhang,
Abstract要約: HardGenは、検証可能な推論を備えたハードツール使用トレーニングサンプルを生成するように設計された自動エージェントパイプラインである。高度なツールとハードクエリにより、検証可能な複雑なChain-of-Thought(CoT)の生成が可能になる私たちのコード、モデル、データセットは、将来の研究を促進するためにオープンソース化されます。
参考スコア（独自算出の注目度）: 40.331752086107265
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The advancement of LLM agents with tool-use capabilities requires diverse and complex training corpora. Existing data generation methods, which predominantly follow a paradigm of random sampling and shallow generation, often yield simple and homogeneous trajectories that fail to capture complex, implicit logical dependencies. To bridge this gap, we introduce HardGen, an automatic agentic pipeline designed to generate hard tool-use training samples with verifiable reasoning. Firstly, HardGen establishes a dynamic API Graph built upon agent failure cases, from which it samples to synthesize hard traces. Secondly, these traces serve as conditional priors to guide the instantiation of modular, abstract advanced tools, which are subsequently leveraged to formulate hard queries. Finally, the advanced tools and hard queries enable the generation of verifiable complex Chain-of-Thought (CoT), with a closed-loop evaluation feedback steering the continuous refinement of the process. Extensive evaluations demonstrate that a 4B parameter model trained with our curated dataset achieves superior performance compared to several leading open-source and closed-source competitors (e.g., GPT-5.2, Gemini-3-Pro and Claude-Opus-4.5). Our code, models, and dataset will be open-sourced to facilitate future research.
Abstract（参考訳）: ツール使用能力を有するLLMエージェントの進歩には、多種多様な複雑なトレーニングコーパスが必要である。既存のデータ生成手法は、主にランダムサンプリングと浅い生成のパラダイムに従っており、複雑で暗黙的な論理的依存関係を捕捉できない単純で均質な軌道を生成することが多い。このギャップを埋めるために、検証可能な推論を伴うハードツール使用トレーニングサンプルを生成するように設計された自動エージェントパイプラインであるHardGenを紹介します。第一に、HardGenはエージェント障害ケースの上に構築された動的APIグラフを確立し、ハードトレースをサンプリングして合成する。第二に、これらのトレースは、モジュール的で抽象的な高度なツールのインスタンス化を誘導する条件付き先行として機能し、その後、ハードクエリの定式化に活用される。最後に、高度なツールとハードクエリにより、検証可能な複雑なChain-of-Thought(CoT)の生成が可能になる。大規模な評価では、我々のキュレートデータセットでトレーニングされた4Bパラメータモデルは、主要なオープンソースおよびクローズドソースの競合(例えば、GPT-5.2、Gemini-3-Pro、Claude-Opus-4.5)と比較して、優れたパフォーマンスを実現している。私たちのコード、モデル、データセットは、将来の研究を促進するためにオープンソース化されます。

論文の概要: From Failure to Mastery: Generating Hard Samples for Tool-use Agents

関連論文リスト