Fugu-MT 論文翻訳(概要): FuXi-β: Towards a Lightweight and Fast Large-Scale Generative Recommendation Model

論文の概要: FuXi-β: Towards a Lightweight and Fast Large-Scale Generative Recommendation Model

arxiv url: http://arxiv.org/abs/2508.10615v1
Date: Thu, 14 Aug 2025 13:12:29 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-15 22:24:48.321456
Title: FuXi-β: Towards a Lightweight and Fast Large-Scale Generative Recommendation Model
Title（参考訳）: FuXi-β:軽量かつ高速な大規模生成レコメンデーションモデルを目指して
Authors: Yufei Ye, Wei Guo, Hao Wang, Hong Zhu, Yuyang Ye, Yong Liu, Huifeng Guo, Ruiming Tang, Defu Lian, Enhong Chen,
Abstract要約: 本稿では,Transformerライクなリコメンデーションモデルのための新しいフレームワークを提案する。 FuXi-$beta$は従来の最先端モデルより優れ、大幅な加速を実現している。私たちのコードはパブリックリポジトリで利用可能です。
参考スコア（独自算出の注目度）: 87.38823851271758
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scaling laws for autoregressive generative recommenders reveal potential for larger, more versatile systems but mean greater latency and training costs. To accelerate training and inference, we investigated the recent generative recommendation models HSTU and FuXi-$\alpha$, identifying two efficiency bottlenecks: the indexing operations in relative temporal attention bias and the computation of the query-key attention map. Additionally, we observed that relative attention bias in self-attention mechanisms can also serve as attention maps. Previous works like Synthesizer have shown that alternative forms of attention maps can achieve similar performance, naturally raising the question of whether some attention maps are redundant. Through empirical experiments, we discovered that using the query-key attention map might degrade the model's performance in recommendation tasks. To address these bottlenecks, we propose a new framework applicable to Transformer-like recommendation models. On one hand, we introduce Functional Relative Attention Bias, which avoids the time-consuming operations of the original relative attention bias, thereby accelerating the process. On the other hand, we remove the query-key attention map from the original self-attention layer and design a new Attention-Free Token Mixer module. Furthermore, by applying this framework to FuXi-$\alpha$, we introduce a new model, FuXi-$\beta$. Experiments across multiple datasets demonstrate that FuXi-$\beta$ outperforms previous state-of-the-art models and achieves significant acceleration compared to FuXi-$\alpha$, while also adhering to the scaling law. Notably, FuXi-$\beta$ shows an improvement of 27% to 47% in the NDCG@10 metric on large-scale industrial datasets compared to FuXi-$\alpha$. Our code is available in a public repository: https://github.com/USTC-StarTeam/FuXi-beta
Abstract（参考訳）: 自己回帰生成レコメンデータのスケーリング法則は、より大きな、より多用途なシステムの可能性を示すが、レイテンシとトレーニングコストの増大を意味する。トレーニングと推論を高速化するために, HSTU と FuXi-$\alpha$ の2つの効率ボトルネックを同定し, 相対時間的注意バイアスのインデックス化操作とクエリキー注意マップの計算を行った。また,自己注意機構の相対的注意バイアスも注意マップとして有効であることがわかった。従来のSynthesizerのような研究は、他のアテンションマップが同じようなパフォーマンスを達成できることを示しており、アテンションマップが冗長であるかどうかという疑問を自然に提起している。実験により,クエリキーのアテンションマップを用いることで,推薦タスクにおけるモデルの性能が低下することを発見した。これらのボトルネックに対処するため,Transformerライクなリコメンデーションモデルに適用可能な新しいフレームワークを提案する。一方,機能的相対的注意バイアスは,本来の相対的注意バイアスの時間的操作を回避し,プロセスの高速化を図る。一方、クエリキーのアテンションマップを元のセルフアテンション層から取り除き、新しいアテンションフリーのToken Mixerモジュールを設計する。さらに、このフレームワークをFuXi-$\alpha$に適用することにより、FuXi-$\beta$という新しいモデルを導入する。複数のデータセットにわたる実験によると、FuXi-$\beta$は過去の最先端モデルより優れており、FuXi-$\alpha$よりも大幅に加速されている。特に、FuXi-$\beta$は、大規模産業データセットにおけるNDCG@10の27%から47%の改善を示している。私たちのコードはパブリックリポジトリで利用可能です。

論文の概要: FuXi-β: Towards a Lightweight and Fast Large-Scale Generative Recommendation Model

関連論文リスト