Fugu-MT 論文翻訳(概要): Sample Is Feature: Beyond Item-Level, Toward Sample-Level Tokens for Unified Large Recommender Models

論文の概要: Sample Is Feature: Beyond Item-Level, Toward Sample-Level Tokens for Unified Large Recommender Models

arxiv url: http://arxiv.org/abs/2604.15650v1
Date: Fri, 17 Apr 2026 02:47:31 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-20 22:00:19.716611
Title: Sample Is Feature: Beyond Item-Level, Toward Sample-Level Tokens for Unified Large Recommender Models
Title（参考訳）: サンプルはアイテムレベルを超えて, 統一された大規模レコメンダモデルのためのサンプルレベルトークン
Authors: Shuli Wang, Junwei Yin, Changhao Li, Senjie Kou, Chi Wang, Yinqiu Huang, Yinhua Zhu, Haitao Wang, Xingxing Wang,
Abstract要約: textbfSIF (emphSample Is Feature) を提案する。 textbfSample Tokenizerは、各歴史的なRawサンプルを階層的なグループ適応量子化によってTokenサンプルに量子化する。 textbfSIF-Mixerは、トークンレベルとサンプルレベルの混合を通じて、同種サンプル表現に対する深い特徴相互作用を行い、モデルの表現能力を完全に解放する。
参考スコア（独自算出の注目度）: 14.332200648147863
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scaling industrial recommender models has followed two parallel paradigms: \textbf{sample information scaling} -- enriching the information content of each training sample through deeper and longer behavior sequences -- and \textbf{model capacity scaling} -- unifying sequence modeling and feature interaction within a single Transformer backbone. However, these two paradigms still face two structural limitations. Firstly, sample information scaling methods encode only a subset of each historical interaction into the sequence token, leaving the majority of the original sample context unexploited and precluding the modeling of sample-level, time-varying features. Secondly, model capacity scaling methods are inherently constrained by the structural heterogeneity between sequential and non-sequential features, preventing the model from fully realizing its representational capacity. To address these issues, we propose \textbf{SIF} (\emph{Sample Is Feature}), which encodes each historical Raw Sample directly into the sequence token -- maximally preserving sample information while simultaneously resolving the heterogeneity between sequential and non-sequential features. SIF consists of two key components. The \textbf{Sample Tokenizer} quantizes each historical Raw Sample into a Token Sample via hierarchical group-adaptive quantization (HGAQ), enabling full sample-level context to be incorporated into the sequence efficiently. The \textbf{SIF-Mixer} then performs deep feature interaction over the homogeneous sample representations via token-level and sample-level mixing, fully unleashing the model's representational capacity. Extensive experiments on a large-scale industrial dataset validate SIF's effectiveness, and we have successfully deployed SIF on the Meituan food delivery platform.
Abstract（参考訳）: 産業レコメンデータモデルのスケーリングは、2つのパラレルパラダイムに従っている。 \textbf{sample information scaling} -- より深く長い振る舞いシーケンスを通じて、各トレーニングサンプルの情報内容を豊かにする -- と、 \textbf{model capacity scaling} -- 単一のTransformerバックボーン内でシーケンスモデリングと機能インタラクションを統合する。しかし、これらの2つのパラダイムは依然として2つの構造的制限に直面している。まず、サンプル情報スケーリング手法は、各歴史的相互作用のサブセットのみをシーケンストークンにエンコードし、元のサンプルコンテキストの大部分を未公開のままにして、サンプルレベルの時間変化のある特徴のモデリングを先取りする。第二に、モデルキャパシティスケーリング法は、シーケンシャルな特徴と非シーケンシャルな特徴の間の構造的不均一性によって本質的に制約され、モデルがその表現キャパシティを完全に実現しない。これらの問題に対処するために,各歴史的Rawサンプルを直接シーケンストークンにエンコードする \textbf{SIF} (\emph{Sample Is Feature}) を提案する。 SIFは2つのキーコンポーネントから構成される。 textbf{Sample Tokenizer} は、各歴史的なRawサンプルを階層的なグループ適応量子化(HGAQ)を介してTokenサンプルに量子化し、完全なサンプルレベルのコンテキストをシーケンスに効率的に組み込むことができる。次に \textbf{SIF-Mixer} はトークンレベルとサンプルレベルの混合を通じて同質なサンプル表現に対する深い特徴相互作用を行い、モデルの表現能力を完全に解放する。大規模産業データセットの大規模実験により, SIFの有効性が検証され, SIFをMeituanフードデリバリープラットフォームに導入することに成功している。

論文の概要: Sample Is Feature: Beyond Item-Level, Toward Sample-Level Tokens for Unified Large Recommender Models

関連論文リスト