Fugu-MT 論文翻訳(概要): AgenticRec: End-to-End Tool-Integrated Policy Optimization for Ranking-Oriented Recommender Agents

論文の概要: AgenticRec: End-to-End Tool-Integrated Policy Optimization for Ranking-Oriented Recommender Agents

arxiv url: http://arxiv.org/abs/2603.21613v1
Date: Mon, 23 Mar 2026 06:18:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-24 19:11:39.521866
Title: AgenticRec: End-to-End Tool-Integrated Policy Optimization for Ranking-Oriented Recommender Agents
Title（参考訳）: AgenticRec: ランク付け指向のレコメンダエージェントのためのエンドツーエンドツール統合ポリシー最適化
Authors: Tianyi Li, Zixuan Wang, Guidong Lei, Xiaodong Li, Hui Li,
Abstract要約: 本稿では、ランキング指向のエージェントレコメンデーションフレームワークであるAgenticRecを紹介する。根拠に基づく推論をサポートするために、ReActループに統合されたレコメンデーション固有のツール群を設計する。本稿では,微粒な嗜好の曖昧さを解決するために,プログレッシブな選好リファインメントを導入する。
参考スコア（独自算出の注目度）: 26.289918893920984
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recommender agents built on Large Language Models offer a promising paradigm for recommendation. However, existing recommender agents typically suffer from a disconnect between intermediate reasoning and final ranking feedback, and are unable to capture fine-grained preferences. To address this, we present AgenticRec, a ranking-oriented agentic recommendation framework that optimizes the entire decision-making trajectory (including intermediate reasoning, tool invocation, and final ranking list generation) under sparse implicit feedback. Our approach makes three key contributions. First, we design a suite of recommendation-specific tools integrated into a ReAct loop to support evidence-grounded reasoning. Second, we propose theoretically unbiased List-Wise Group Relative Policy Optimization (list-wise GRPO) to maximize ranking utility, ensuring accurate credit assignment for complex tool-use trajectories. Third, we introduce Progressive Preference Refinement (PPR) to resolve fine-grained preference ambiguities. By mining hard negatives from ranking violations and applying bidirectional preference alignment, PPR minimizes the convex upper bound of pairwise ranking errors. Experiments on benchmarks confirm that AgenticRec significantly outperforms baselines, validating the necessity of unifying reasoning, tool use, and ranking optimization.
Abstract（参考訳）: 大規模言語モデル上に構築されたレコメンダエージェントは、推奨のための有望なパラダイムを提供する。しかし、既存のレコメンダエージェントは通常、中間的推論と最終ランクのフィードバックの切り離しに悩まされ、きめ細かい好みを捉えることができない。これを解決するためにAgenticRecを提案する。AgenticRecは、厳密な暗黙のフィードバックの下で、意思決定の軌道全体(中間的推論、ツール呼び出し、最終ランクリスト生成を含む)を最適化するランキング指向のエージェントレコメンデーションフレームワークである。私たちのアプローチは3つの重要な貢献をします。まず、エビデンスに基づく推論をサポートするために、ReActループに統合されたレコメンデーション固有のツール群を設計する。第2に,理論的に偏りのないリストワイズグループ相対ポリシー最適化(リストワイズGRPO)を提案する。第3に、細かな好みのあいまいさを解決するためにプログレッシブ・プライス・リファインメント(PPR)を導入する。 PPRは、ランキング違反からハードネガティブをマイニングし、双方向の選好アライメントを適用することにより、ペアのランクエラーの凸上限を最小化する。ベンチマークの実験では、AgenticRecがベースラインを大幅に上回っており、統一推論、ツール使用、ランキング最適化の必要性を検証している。

論文の概要: AgenticRec: End-to-End Tool-Integrated Policy Optimization for Ranking-Oriented Recommender Agents

関連論文リスト