Fugu-MT 論文翻訳(概要): Self-Distilled Reinforcement Learning for Co-Evolving Agentic Recommender Systems

論文の概要: Self-Distilled Reinforcement Learning for Co-Evolving Agentic Recommender Systems

arxiv url: http://arxiv.org/abs/2604.10029v1
Date: Sat, 11 Apr 2026 04:52:38 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 20:13:15.799395
Title: Self-Distilled Reinforcement Learning for Co-Evolving Agentic Recommender Systems
Title（参考訳）: 共進化型エージェントレコメンダシステムの自己拡張強化学習
Authors: Zongwei Wang, Min Gao, Hongzhi Yin, Junliang Yu, Tong Chen, Shazia Sadiq, Tianrui Li,
Abstract要約: 大規模言語モデルを用いたエージェント推薦システム(ARS)は、推奨エージェントとユーザエージェントとのマルチターンインタラクションとしてレコメンデーションを再構成する。既存のARSは主にリフレクションスタイルのパラダイムで最適化されており、過去のインタラクショントラジェクトリはテキストメモリとして格納される。エージェントレコメンデータシステムの共進化のための自己蒸留型強化学習フレームワークであるCoARSを提案する。
参考スコア（独自算出の注目度）: 43.20339689871105
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language model-empowered agentic recommender systems (ARS) reformulate recommendation as a multi-turn interaction between a recommender agent and a user agent, enabling iterative preference elicitation and refinement beyond conventional one-shot prediction. However, existing ARS are mainly optimized in a Reflexion-style paradigm, where past interaction trajectories are stored as textual memory and retrieved as prompt context for later reasoning. Although this design allows agents to recall prior feedback and observations, the accumulated experience remains external to model parameters, leaving agents reliant on generic reasoning rather than progressively acquiring recommendation-specific decision-making ability through learning. Reinforcement learning (RL) therefore provides a natural way to internalize such interaction experience into parameters. Yet existing RL methods for ARS still suffer from two key limitations. First, they fail to capture the interactive nature of ARS, in which the recommender agent and the user agent continuously influence each other and can naturally generate endogenous supervision through interaction feedback. Second, they reduce a rich multi-turn interaction process to final outcomes, overlooking the dense supervision embedded throughout the trajectory. To this end, we propose CoARS, a self-distilled reinforcement learning framework for co-evolving agentic recommender systems. CoARS introduces two complementary learning schemes: interaction reward, which derives coupled task-level supervision for the recommender agent and the user agent from the same interaction trajectory, and self-distilled credit assignment, which converts historical trajectories into token-level credit signals under teacher-student conditioning. Experiments on multiple datasets show that CoARS outperforms representative ARS baselines in recommendation performance and user alignment.
Abstract（参考訳）: 大規模言語モデルを用いたエージェント推薦システム(ARS)は、推奨エージェントとユーザエージェントとのマルチターンインタラクションとしてレコメンデーションを再構成し、従来のワンショット予測以上の反復的嗜好の推論と洗練を可能にする。しかし、既存のARSは主にReflexionスタイルのパラダイムで最適化されており、過去のインタラクショントラジェクトリはテキストメモリとして格納され、後続の推論のプロンプトコンテキストとして検索される。この設計により、エージェントは事前のフィードバックや観察を思い出すことができるが、蓄積された経験はモデルパラメータとは相容れないままであり、エージェントは学習を通じて推奨固有の意思決定能力を徐々に獲得するのではなく、一般的な推論に依存している。したがって、強化学習(RL)は、そのような相互作用体験をパラメータに内部化する自然な方法を提供する。しかし、既存のRSのRLメソッドには2つの重要な制限がある。まず、推奨エージェントとユーザエージェントが相互に影響を与え、対話フィードバックを通じて内因性監視を自然に生成する、ARSのインタラクティブな性質を捉えることに失敗する。第二に、リッチなマルチターン相互作用プロセスを最終結果に還元し、軌道全体に埋め込まれた密集した監視を見渡す。この目的のために,エージェントレコメンデータシステムの共進化のための自己蒸留型強化学習フレームワークであるCoARSを提案する。 CoARSは2つの補完的学習スキームを導入している: インタラクション報酬は、推薦者エージェントとユーザエージェントのタスクレベルの監督を同一のインタラクション軌跡から引き起こし、自己蒸留された信用割当は、歴史トラジェクトリを教師の条件下でトークンレベルの信用信号に変換する。複数のデータセットの実験によると、CoARSは推奨パフォーマンスとユーザのアライメントにおいて、代表的ARSベースラインを上回っている。

論文の概要: Self-Distilled Reinforcement Learning for Co-Evolving Agentic Recommender Systems

関連論文リスト