Fugu-MT 論文翻訳(概要): RecRM-Bench: Benchmarking Multidimensional Reward Modeling for Agentic Recommender Systems

論文の概要: RecRM-Bench: Benchmarking Multidimensional Reward Modeling for Agentic Recommender Systems

arxiv url: http://arxiv.org/abs/2605.11874v1
Date: Tue, 12 May 2026 09:52:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-13 21:48:56.775318
Title: RecRM-Bench: Benchmarking Multidimensional Reward Modeling for Agentic Recommender Systems
Title（参考訳）: RecRM-Bench:エージェントレコメンダシステムの多次元リワードモデリングのベンチマーク
Authors: Wenwen Zeng, Jinhui Zhang, Hao Chen, Zhaoyu Hu, Yongqi Liang, Jiajun Chai, Dengcan Liu, Zhenfeng Liu, Shurui Yan, Minglong Xue, Xiaohan Wang, Wei Lin, Guojun Yin,
Abstract要約: 本稿では,エージェントレコメンデータシステムにおいて,これまでで最大かつ最も包括的なベンチマークであるRecRM-Benchを紹介する。 4つの中核評価次元にまたがる100万以上の構造化されたエントリで構成されている。命令従順、事実整合性、クエリ-イテム関連性、きめ細かいユーザ行動予測である。本稿では,多次元報酬モデルの構築とハイブリッド報酬関数の統合のための体系的枠組みを提案する。
参考スコア（独自算出の注目度）: 40.152754832576996
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The integration of Large Language Model (LLM) agents is transforming recommender systems from simple query-item matching towards deeply personalized and interactive recommendations. Reinforcement Learning (RL) provides an essential framework for the optimization of these agents in recommendation tasks. However, current methodologies remain limited by a reliance on single dimensional outcome-based rewards that focus exclusively on final user interactions, overlooking critical intermediate capabilities, such as instruction following and complex intent understanding. Despite the necessity for designing multi-dimensional reward, the field lacks a standardized benchmark to facilitate this development. To bridge this gap, we introduce RecRM-Bench, the largest and most comprehensive benchmark to date for agentic recommender systems. It comprises over 1 million structured entries across four core evaluation dimensions: instruction following, factual consistency, query-item relevance, and fine-grained user behavior prediction. By supporting comprehensive assessment from syntactic compliance to complex intent grounding and preference modeling, RecRM-Bench provides a foundational dataset for training sophisticated reward models. Furthermore, we propose a systematic framework for the construction of multi-dimensional reward models and the integration of a hybrid reward function, establishing a robust foundation for developing reliable and highly capable agentic recommender systems. The complete RecRM-Bench dataset is publicly available at https://huggingface.co/datasets/wwzeng/RecRM-Bench.
Abstract（参考訳）: LLM(Large Language Model)エージェントの統合により、リコメンダシステムは、単純なクエリ項目マッチングから、深くパーソナライズされ対話的なレコメンデーションへと変換される。強化学習(RL)は、推奨タスクにおけるこれらのエージェントの最適化に不可欠なフレームワークを提供する。しかし、現在の手法は、指示の追従や複雑な意図理解といった重要な中間的能力を見越して、最終的なユーザインタラクションのみに焦点を絞った、単一次元の成果に基づく報酬に頼ることによって制限されている。多次元の報酬を設計する必要性にもかかわらず、この分野にはこの開発を促進するための標準ベンチマークが欠けている。このギャップを埋めるために、エージェントレコメンデータシステムにおいて、これまでで最大かつ最も包括的なベンチマークであるRecRM-Benchを紹介します。 4つの中核評価次元にまたがる100万以上の構造化されたエントリで構成されている。命令従順、事実整合性、クエリ-イテム関連性、きめ細かいユーザ行動予測である。 RecRM-Benchは、統語的コンプライアンスから複雑な意図の基盤と嗜好モデリングへの包括的アセスメントをサポートすることで、洗練された報酬モデルをトレーニングするための基礎的なデータセットを提供する。さらに,多次元報酬モデルの構築とハイブリッド報酬関数の統合のための体系的枠組みを提案する。完全なRecRM-Benchデータセットはhttps://huggingface.co/datasets/wwzeng/RecRM-Benchで公開されている。

論文の概要: RecRM-Bench: Benchmarking Multidimensional Reward Modeling for Agentic Recommender Systems

関連論文リスト