Fugu-MT 論文翻訳(概要): MiniRec: Data-Efficient Reinforcement Learning for LLM-based Recommendation

論文の概要: MiniRec: Data-Efficient Reinforcement Learning for LLM-based Recommendation

arxiv url: http://arxiv.org/abs/2602.04278v1
Date: Wed, 04 Feb 2026 07:15:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-05 19:45:11.418244
Title: MiniRec: Data-Efficient Reinforcement Learning for LLM-based Recommendation
Title（参考訳）: MiniRec: LLMに基づくレコメンデーションのためのデータ効率の良い強化学習
Authors: Lin Wang, Yang Zhang, Jingfan Chen, Xiaoyan Zhao, Fengbin Zhu, Qing Li, Tat-Seng Chua,
Abstract要約: MiniRecは、RLベースの大規模言語モデル(LLM)レコメンデーションに適したデータ選択フレームワークである。重要なRL信号 -- 報酬 -- を使ってサンプルの学習性を評価する。
参考スコア（独自算出の注目度）: 50.417769112326546
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The integration of reinforcement learning (RL) into large language models (LLMs) has opened new opportunities for recommender systems by eliciting reasoning and improving user preference modeling. However, RL-based LLM recommendation faces significant efficiency challenges, making full-data training costly. Existing data selection methods define sample value based on learnability or representativeness, yet their loss- or gradient-driven or dataset coverage-driven criteria often misalign with RL learning dynamics, resulting in suboptimal performance. To address this, we propose MiniRec, a data selection framework tailored for RL-based LLM recommendation. MiniRec evaluates sample learnability using key RL signals -- rewards -- pruning samples that are too easy (too high reward) or too difficult (consistently low reward). It assesses representativeness by aligning sample gradients with the approximated "ideal" global RL optimization trajectory, selecting samples that mainly drive model updates, and it also enforces diversity to reduce redundancy. Combined with a curriculum learning strategy from easy to hard samples, MiniRec significantly reduces training cost while largely preserving performance. Extensive experiments demonstrate MiniRec's effectiveness, highlighting the importance of reward-aligned, trajectory-informed data selection in RL-based LLM recommendation.
Abstract（参考訳）: 大規模言語モデル (LLM) への強化学習 (RL) の統合により, 推論を取り入れ, ユーザの嗜好モデリングを改善することで, 推薦システムに新たな機会が開かれた。しかし、RLベースのLLMレコメンデーションは大きな効率上の課題に直面し、完全なデータトレーニングにコストがかかる。既存のデータ選択手法は、学習可能性や代表性に基づいてサンプル値を定義するが、その損失または勾配駆動またはデータセットのカバレッジ駆動の基準は、しばしばRL学習のダイナミクスと不一致であり、結果として準最適性能をもたらす。そこで本研究では,RLベースのLLMレコメンデーションに適したデータ選択フレームワークであるMiniRecを提案する。 MiniRecは、重要なRL信号 -- 報酬 -- を使ってサンプルの学習性を評価する。サンプル勾配を近似された「理想的」グローバルRL最適化軌道と整合させることで代表性を評価し、主にモデル更新を駆動するサンプルを選択するとともに、冗長性を低減するために多様性を強制する。簡単なサンプルから難しいサンプルまで、カリキュラムの学習戦略と組み合わせることで、MiniRecはトレーニングコストを大幅に削減し、パフォーマンスをほぼ維持する。大規模な実験はMiniRecの有効性を示し、RLベースのLLMレコメンデーションにおける報酬整合性、トラジェクトリインフォームドデータ選択の重要性を強調している。

論文の概要: MiniRec: Data-Efficient Reinforcement Learning for LLM-based Recommendation

関連論文リスト