Fugu-MT 論文翻訳(概要): Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

論文の概要: Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

arxiv url: http://arxiv.org/abs/2510.24702v1
Date: Tue, 28 Oct 2025 17:53:13 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-29 15:35:37.329547
Title: Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents
Title（参考訳）: Agent Data Protocol: LLMエージェントの多種多様な効率的な微調整のためのデータセットの統合
Authors: Yueqi Song, Ketan Ramaneti, Zaid Sheikh, Ziru Chen, Boyu Gou, Tianbao Xie, Yiheng Xu, Danyang Zhang, Apurva Gandhi, Fan Yang, Joseph Liu, Tianyue Ou, Zhihao Yuan, Frank Xu, Shuyan Zhou, Xingyao Wang, Xiang Yue, Tao Yu, Huan Sun, Yu Su, Graham Neubig,
Abstract要約: 本稿では,エージェントデータセット間の"インターリングア"として機能する軽量表現言語であるエージェントデータプロトコル(ADP)を紹介する。 ADPはAPI/ツールの使用、ブラウジング、コーディング、ソフトウェアエンジニアリング、一般的なエージェントなど、さまざまなタスクを捉えるのに十分な表現力を持っている。すべてのコードとデータが公開され、ADPが標準化され、スケーラブルで再現可能なエージェントトレーニングの障壁を低くすることを期待している。
参考スコア（独自算出の注目度）: 85.02904078131682
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Public research results on large-scale supervised finetuning of AI agents remain relatively rare, since the collection of agent training data presents unique challenges. In this work, we argue that the bottleneck is not a lack of underlying data sources, but that a large variety of data is fragmented across heterogeneous formats, tools, and interfaces. To this end, we introduce the agent data protocol (ADP), a light-weight representation language that serves as an "interlingua" between agent datasets in diverse formats and unified agent training pipelines downstream. The design of ADP is expressive enough to capture a large variety of tasks, including API/tool use, browsing, coding, software engineering, and general agentic workflows, while remaining simple to parse and train on without engineering at a per-dataset level. In experiments, we unified a broad collection of 13 existing agent training datasets into ADP format, and converted the standardized ADP data into training-ready formats for multiple agent frameworks. We performed SFT on these data, and demonstrated an average performance gain of ~20% over corresponding base models, and delivers state-of-the-art or near-SOTA performance on standard coding, browsing, tool use, and research benchmarks, without domain-specific tuning. All code and data are released publicly, in the hope that ADP could help lower the barrier to standardized, scalable, and reproducible agent training.
Abstract（参考訳）: AIエージェントの大規模微調整に関する公開研究結果は、エージェントトレーニングデータの収集がユニークな課題を呈しているため、比較的稀である。この研究では、ボトルネックは基礎となるデータソースの欠如ではなく、多種多様なデータが異質なフォーマット、ツール、インターフェースにまたがって断片化されている、と論じる。この目的のために,エージェントデータプロトコル (ADP) を導入し,多様な形式のエージェントデータセットと下流でのエージェントトレーニングパイプラインの"インターリングア"として機能する軽量表現言語を提案する。 ADPの設計は、API/ツールの使用、ブラウジング、コーディング、ソフトウェアエンジニアリング、一般的なエージェントワークフローなど、さまざまなタスクをデータ単位のレベルで解析やトレーニングを行うのに十分な表現力を持っている。実験では、13の既存のエージェントトレーニングデータセットをADPフォーマットに集約し、標準化されたADPデータを複数のエージェントフレームワークのためのトレーニング可能なフォーマットに変換する。我々はこれらのデータ上でSFTを行い、対応するベースモデルに対して平均20%の性能向上を示し、ドメイン固有のチューニングなしで標準的なコーディング、ブラウジング、ツールの使用、研究ベンチマークに対して最先端または近SOTAのパフォーマンスを提供する。すべてのコードとデータが公開され、ADPが標準化され、スケーラブルで再現可能なエージェントトレーニングの障壁を低くすることを期待している。

論文の概要: Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

関連論文リスト