Fugu-MT 論文翻訳(概要): Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

論文の概要: Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

arxiv url: http://arxiv.org/abs/2606.03980v1
Date: Tue, 02 Jun 2026 17:56:57 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-03 22:00:05.242627
Title: Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill
Title（参考訳）: Skill-RM:エージェントスキルによる不均一評価基準の統合
Authors: Tao Chen, Gangwei Jiang, Pengyu Cheng, Siyuan Huang, Yihao Liu, Jingwei Ni, Jiaqi Guo, Mengyu Zhou, Kai Tang, Junling Liu, Qinliang Su, Xiaoxi Jiang, Guanjun Jiang,
Abstract要約: 本稿では、再利用可能なリワード評価スキルの実行として報酬モデリングを再構築する統合フレームワークであるスキル・リワードモデル(Skill-RM)を提案する。報酬計算を構造化されたエージェントタスクとして扱うことで、Skill-RMは異種資源をオーケストレーションするための一貫したインターフェースを提供する。以上の結果から,Skill-RMは報酬モデリングのための統一的なソリューションを提供するだけでなく,エビデンスを戦略的かつ動的にオーケストレーションすることで,優れたパフォーマンスを実現することが示唆された。
参考スコア（独自算出の注目度）: 36.002795736704
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reward models (RMs) provide critical feedback signals for LLM post-training, notably in reinforced fine-tuning (RFT) and reinforcement learning (RL) pipelines. However, current reward evaluation relies on heterogeneous criteria such as rule-based verifiers, ground-truth references, procedural checklists, and complex rubrics, where a unified mechanism to integrate all types of evidence remains unexplored. To this end, we propose Skill Reward Model (Skill-RM), a unified framework that reformulates reward modeling as the execution of a reusable Reward-Evaluation Skill. By treating reward computation as a structured agentic task, Skill-RM provides a consistent interface to orchestrate heterogeneous resources, dynamically selecting and aggregating evidence tailored to the specific requirements of each input. This approach enables the reward model to move beyond static evaluation, ensuring consistency and transparency across diverse tasks. Extensive experiments on reward benchmarks and downstream applications, including best-of-N selection and reinforcement learning, demonstrate that Skill-RM consistently outperforms traditional judge baselines. Our findings suggest that Skill-RM not only provides a unified solution for reward modeling but also achieves superior performance through the strategic and dynamic orchestration of evidence. The code is at https://github.com/Qwen-Applications/Skill-RM.
Abstract（参考訳）: リワードモデル(RM)は、特に強化微細チューニング(RFT)と強化学習(RL)パイプラインにおいて、LLM後のトレーニングに重要なフィードバック信号を提供する。しかし、現在の報酬評価は、ルールベースの検証、基底真実参照、手続きチェックリスト、複雑なルーリックといった不均一な基準に依存しており、あらゆる種類の証拠を統合する統一的なメカニズムがまだ解明されていない。この目的のために,再利用可能なリワード評価スキルの実行として報酬モデリングを再構築する統合フレームワークであるスキル・リワードモデル(Skill-RM)を提案する。報酬計算を構造化されたエージェントタスクとして扱うことで、Skill-RMは異種資源をオーケストレーションするための一貫したインターフェースを提供し、各入力の特定の要求に合わせた証拠を動的に選択し集約する。このアプローチにより、報酬モデルは静的評価を超えて、さまざまなタスク間の一貫性と透明性を保証することができる。報酬ベンチマークや、ベストオブNの選択や強化学習など、下流のアプリケーションに関する大規模な実験は、Skill-RMが従来の審査基準を一貫して上回っていることを示している。以上の結果から,Skill-RMは報酬モデリングのための統一的なソリューションを提供するだけでなく,エビデンスを戦略的かつ動的にオーケストレーションすることで,優れたパフォーマンスを実現することが示唆された。コードはhttps://github.com/Qwen-Applications/Skill-RMにある。

論文の概要: Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

関連論文リスト