Fugu-MT 論文翻訳(概要): Skill Is Not Document: A Query-Conditional Benchmark and Two-Stage Retriever for LLM Agent Skill Routing

論文の概要: Skill Is Not Document: A Query-Conditional Benchmark and Two-Stage Retriever for LLM Agent Skill Routing

arxiv url: http://arxiv.org/abs/2606.03565v1
Date: Tue, 02 Jun 2026 12:30:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-03 22:00:04.994785
Title: Skill Is Not Document: A Query-Conditional Benchmark and Two-Stage Retriever for LLM Agent Skill Routing
Title（参考訳）: スキルはドキュメントではない - LLMエージェントスキルルーティングのためのクエリコンディションベンチマークと2段階のリトライ
Authors: Zifei Wang, Wei Wen, Qiang Ji, Ruizhi Qiao, Xing Sun,
Abstract要約: R3-Skillは、現実的なエージェントスキルルーティングのベンチマークである。スキル互換性を明示的な訓練信号とする2段階検索システムを構築した。データセット、トレーニングコード、モデルウェイトは、エージェントスキルルーティングのためのオープンソースとしてリリースされている。
参考スコア（独自算出の注目度）: 40.648572239231804
License: http://creativecommons.org/licenses/by/4.0/
Abstract: LLM agents complete complex tasks by composing multiple skills, and skill retrieval is a front-end stage for agents. Skill retrieval differs fundamentally from traditional document retrieval at the supervision level: top-K joint correctness depends not only on the semantic relevance of each individual query-skill pair, but also on whether the skills retrieved together can collaborate to fulfill the task under the given query. Such "skill compatibility" cannot be derived from independent relevance alone. Yet existing LLM-based data synthesis pipelines can produce a direct supervision signal for "which skills should not be jointly retrieved under this query" -- namely the LLM's own rejection decisions -- and this signal is routinely discarded as low-quality data. To address this gap, we propose Reject-as-Resource Retriever (R3) and construct R3-Skill, a bilingual (Chinese-English) skill retrieval benchmark targeting realistic agent skill routing. R3-Skill spans four language directions, features query phrasings close to real user requests, and is verified through multi-expert cross-checking. On R3-Skill, we build a two-stage retrieval system (R3-Embedding + R3-Reranker) with skill compatibility as an explicit training signal. Gradient analysis shows that the "push-away" signal is diluted by bilateral balancing in the bi-encoder but acts as lossless graded ranking supervision in the cross-encoder -- motivating its placement at the cross-encoder stage, as confirmed by ablations on two datasets. The R3-Embedding + R3-Reranker pipeline attains Hit@1 = 0.7714, NDCG@10 = 0.8327 and Set-Compat = 0.3525 on R3-Skill. The dataset, training code and model weights are released as open source for agent skill routing.
Abstract（参考訳）: LLMエージェントは複数のスキルを構成することで複雑なタスクを完了し、スキル検索はエージェントのフロントエンドステージである。トップK結合の正しさは、個々のクエリスキルペアのセマンティックな関連性だけでなく、一緒に検索したスキルが、与えられたクエリの下でタスクを遂行するために協力できるかどうかにも依存する。このような「スキルの互換性」は、独立した関連性だけでは導き出せない。しかし、既存のLLMベースのデータ合成パイプラインは、"このクエリの下でどのスキルを共同で取得すべきでないか"、すなわちLSM自身の拒否判断に対して、直接的な監視信号を生成することができ、この信号は、通常、低品質のデータとして破棄される。このギャップに対処するために、Reject-as-Resource Retriever (R3) を提案し、現実的なエージェントスキルルーティングをターゲットとしたバイリンガル(中国語-英語)スキル検索ベンチマークであるR3-Skillを構築した。 R3-Skillは4つの言語方向にまたがり、実際のユーザリクエストに近いクエリのフレーズを特徴とし、マルチエキスパートのクロスチェックによって検証される。 R3-Skill上では、明示的なトレーニング信号としてスキル互換性を備えた2段階検索システム(R3-Embedding + R3-Reranker)を構築する。グラディエント分析によると、"プッシュアウェイ"信号はバイエンコーダのバイバランシングによって希釈されるが、クロスエンコーダの無数のランク管理として機能し、クロスエンコーダの段階での配置を動機付けている。 R3-Embedding + R3-RerankerパイプラインはHit@1 = 0.7714, NDCG@10 = 0.8327, Set-Compat = 0.3525に達する。データセット、トレーニングコード、モデルウェイトは、エージェントスキルルーティングのためのオープンソースとしてリリースされている。

論文の概要: Skill Is Not Document: A Query-Conditional Benchmark and Two-Stage Retriever for LLM Agent Skill Routing

関連論文リスト