Fugu-MT 論文翻訳(概要): Towards Generalizable and Efficient Large-Scale Generative Recommenders

論文の概要: Towards Generalizable and Efficient Large-Scale Generative Recommenders

arxiv url: http://arxiv.org/abs/2605.23312v1
Date: Fri, 22 May 2026 07:31:00 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-25 17:29:20.239279
Title: Towards Generalizable and Efficient Large-Scale Generative Recommenders
Title（参考訳）: 汎用的で効率的な大規模生成レコメンダを目指して
Authors: Qiuling Xu, Ko-Jen Hsiao, Moumita Bhattacharya,
Abstract要約: 生成レコメンデーションモデルは、ユーザの振る舞いをイベントのシーケンスとしてモデル化し、複数のレコメンデーションタスクのための共有バックボーンを提供する。本稿では,2Mから1Bのバックボーンパラメータへの生成レコメンデータのスケーリング経験について述べる。全体としては,タスクのヘッダー,デコードコスト,サービスレイテンシのアライメント,アイテムの一般化とともに,モデルスケールを生産移行問題の1つのコンポーネントとして扱うことを支援する。
参考スコア（独自算出の注目度）: 5.085303286789844
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative recommendation models can model user behavior as sequences of events and provide a shared backbone for multiple recommendation tasks. In production, however, pre-training gains do not automatically translate into downstream application improvements: task headroom, repeated-training cost, serving latency, and item freshness all affect transfer. We describe our experience scaling a generative recommender from 2M to 1B backbone parameters, excluding embedding and decoding layers, in a production-scale title recommendation setting. Across multiple downstream tasks, we observe task-dependent scaling behavior: some tasks approach an empirical ceiling within the observed scale range, while others continue to benefit from additional capacity. This motivates using offset scaling-law fits as a diagnostic for where additional model scale may be more or less useful. We then study production constraints that arise when applying the model in practice. Frequent retraining over trillions of behavior tokens makes training and decoding efficiency important; cached serving can make the immediate next-token target stale; and newly launched titles may need to be scored from semantic metadata before collaborative ID embeddings are reliable. We address these issues with multi-token prediction for serving-latency alignment, sampled softmax and a projected decoding head for efficient repeated training, and semantic item towers with collaborative-embedding masking for cold-start adaptation. In a one-week production-shadow evaluation over 1M users, the 1B-backbone model achieves higher MRR than the 2M-backbone baseline across all reported tasks. Overall, the results support treating model scale as one component of a production transfer problem, alongside task headroom, decoding cost, serving-latency alignment, and item generalization.
Abstract（参考訳）: 生成レコメンデーションモデルは、ユーザの振る舞いをイベントのシーケンスとしてモデル化し、複数のレコメンデーションタスクのための共有バックボーンを提供する。しかし本番環境では、事前トレーニングのゲインが自動的にダウンストリームアプリケーションの改善に変換されない。本稿では,2Mから1Bのバックボーンパラメータへの生成レコメンデータのスケーリング経験について述べる。いくつかのタスクは、観測範囲内の経験的な天井に近づき、他のタスクは追加の能力の恩恵を受け続ける。これにより、オフセットスケーリング法則を用いることで、追加のモデルスケールが多かれ少なかれ有用である可能性のある診断に適合する。次に、実際にモデルを適用する際に生じる生産制約について研究する。数兆の振る舞いトークンを頻繁に再トレーニングすることで、トレーニングと復号化の効率が重要になる。キャッシュされたサービスによって、すぐに次の目標に到達でき、新しくローンチされたタイトルは、コラボレーティブIDの埋め込みが信頼できる前にセマンティックメタデータから取得する必要がある。これらの課題に対処するためには、サービスレイテンシアライメントのためのマルチトークン予測、効率的な繰り返しトレーニングのためのサンプルソフトマックスとプロジェクションデコードヘッド、コールドスタート適応のための協調埋め込みマスキングを用いたセマンティックアイテムタワー、といった課題に対処する。 100万ユーザに対する1週間のプロダクションシャドウ評価では、1Bバックボーンモデルは、報告されたすべてのタスクの2Mバックボーンベースラインよりも高いMRRを達成する。全体としては,タスクのヘッダー,デコードコスト,サービスレイテンシのアライメント,アイテムの一般化とともに,モデルスケールを生産移行問題の1つのコンポーネントとして扱うことを支援する。

論文の概要: Towards Generalizable and Efficient Large-Scale Generative Recommenders

関連論文リスト