Fugu-MT 論文翻訳(概要): ReCast: Recasting Learning Signals for Reinforcement Learning in Generative Recommendation

論文の概要: ReCast: Recasting Learning Signals for Reinforcement Learning in Generative Recommendation

arxiv url: http://arxiv.org/abs/2604.22169v1
Date: Fri, 24 Apr 2026 02:44:36 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-27 15:36:26.316139
Title: ReCast: Recasting Learning Signals for Reinforcement Learning in Generative Recommendation
Title（参考訳）: ReCast: ジェネレーションレコメンデーションにおける強化学習のための学習信号のリキャスト
Authors: Peiyan Zhang, Hanmo Liu, Chengxuan Tong, Yuxia Wu, Wei Guo, Yong Liu,
Abstract要約: 本稿では,ReCastを提案する。 ReCastは全ゼログループに対して最小限の学習性を復元することを示す。また、ReCastは永続的なオールゼロ/シングルヒット体制を緩和します。
参考スコア（独自算出の注目度）: 18.825912740441858
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generic group-based RL assumes that sampled rollout groups are already usable learning signals. We show that this assumption breaks down in sparse-hit generative recommendation, where many sampled groups never become learnable at all. We propose ReCast, a repair-then-contrast learning-signal framework that first restores minimal learnability for all-zero groups and then replaces full-group reward normalization with a boundary-focused contrastive update on the strongest positive and the hardest negative. ReCast leaves the outer RL framework unchanged, modifies only within-group signal construction, and partially decouples rollout search width from actor-side update width. Across multiple generative recommendation tasks, ReCast consistently outperforms OpenOneRec-RL, achieving up to 36.6% relative improvement in Pass@1. Its matched-budget advantage is substantially larger: ReCast reaches the baseline's target performance with only 4.1% of the rollout budget, and this advantage widens with model scale. The same design also yields direct system-level gains, reducing actor-side update time by 16.60x, lowering peak allocated memory by 16.5%, and improving actor MFU by 14.2%. Mechanism analysis shows that ReCast mitigates the persistent all-zero / single-hit regime, restores learnability when natural positives are scarce, and converts otherwise wasted rollout budget into more stable policy updates. These results suggest that, for generative recommendation, the decisive RL problem is not only how to assign rewards, but how to construct learnable optimization events from sparse, structured supervision.
Abstract（参考訳）: ジェネリックグループベースのRLは、サンプルロールアウトグループが既に利用可能な学習信号であると仮定する。この仮定はスパース・ヒット・ジェネレーティブ・レコメンデーション(sparse-hit generative recommendation)において破られ、多くのサンプル群が全く学べなくなることが示される。 ReCastは,全ゼロ群に対して最小限の学習性を復元し,最大かつ最強な正と最強の負の差分中心のコントラスト更新により,全グループ報酬正規化を置き換える,修復段階の学習信号処理フレームワークである。 ReCastは外部のRLフレームワークをそのまま残し、グループ内の信号構成だけを変更し、ロールアウトの幅をアクター側の更新幅から部分的に切り離す。複数のジェネレーティブレコメンデーションタスクの中で、ReCastはOpenOneRec-RLを一貫して上回り、Pass@1の36.6%の改善を達成している。 ReCastはロールアウト予算のわずか4.1%でベースラインの目標性能に到達し、この利点はモデルスケールで拡大する。同じ設計では、直接システムレベルのゲインが得られ、アクター側の更新時間を16.60倍にし、ピーク割り当てメモリを16.5%減らし、アクターMFUを14.2%改善した。メカニズム分析により、ReCastは永続的なオールゼロ/シングルヒット体制を緩和し、自然陽性が不足している場合の学習性を回復し、それ以外の時間の無駄なロールアウト予算をより安定したポリシー更新に変換する。これらの結果は、生成的レコメンデーションにおいて、決定的なRL問題は、報酬を割り当てるだけでなく、スパースで構造化された監督から学習可能な最適化イベントを構築する方法であることを示している。

論文の概要: ReCast: Recasting Learning Signals for Reinforcement Learning in Generative Recommendation

関連論文リスト