Fugu-MT 論文翻訳(概要): TokenMixer-Large: Scaling Up Large Ranking Models in Industrial Recommenders

論文の概要: TokenMixer-Large: Scaling Up Large Ranking Models in Industrial Recommenders

arxiv url: http://arxiv.org/abs/2602.06563v2
Date: Tue, 10 Feb 2026 02:17:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-11 15:31:42.922323
Title: TokenMixer-Large: Scaling Up Large Ranking Models in Industrial Recommenders
Title（参考訳）: TokenMixer-Large: 業界のレコメンデーションにおける大規模ランキングモデルのスケールアップ
Authors: Yuchen Jiang, Jie Zhu, Xintian Han, Hui Lu, Kunmin Bai, Mingyu Yang, Shikang Wu, Ruihao Zhang, Wenlin Zhao, Shipeng Bai, Sijin Zhou, Huizhi Yang, Tianyi Liu, Wenda Liu, Ziyan Gong, Haoran Ding, Zheng Chai, Deping Xie, Zhe Chen, Yuchao Zheng, Peng Xu,
Abstract要約: TokenMixer-Largeは,超大規模レコメンデーションのために設計された,体系的に進化したアーキテクチャである。また, 混合反転操作, 層間残留物, 補助損失を導入することにより, 安定な勾配伝播を確実にする。 TokenMixer-Largeは、オンライントラフィックとオフライン実験でパラメータを7ビリオン、15ビリオンにスケールすることに成功した。
参考スコア（独自算出の注目度）: 28.610671210049247
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While scaling laws for recommendation models have gained significant traction, existing architectures such as Wukong, HiFormer and DHEN, often struggle with sub-optimal designs and hardware under-utilization, limiting their practical scalability. Our previous TokenMixer architecture (introduced in RankMixer paper) addressed effectiveness and efficiency by replacing self-attention with a ightweight token-mixing operator; however, it faced critical bottlenecks in deeper configurations, including sub-optimal residual paths, vanishing gradients, incomplete MoE sparsification and constrained scalability. In this paper, we propose TokenMixer-Large, a systematically evolved architecture designed for extreme-scale recommendation. By introducing a mixing-and-reverting operation, inter-layer residuals and the auxiliary loss, we ensure stable gradient propagation even as model depth increases. Furthermore, we incorporate a Sparse Per-token MoE to enable efficient parameter expansion. TokenMixer-Large successfully scales its parameters to 7-billion and 15-billion on online traffic and offline experiments, respectively. Currently deployed in multiple scenarios at ByteDance, TokenMixer-Large has achieved significant offline and online performance gains, delivering an increase of +1.66\% in orders and +2.98\% in per-capita preview payment GMV for e-commerce, improving ADSS by +2.0\% in advertising and achieving a +1.4\% revenue growth for live streaming.
Abstract（参考訳）: レコメンデーションモデルのスケーリング法則は大きな注目を集めているが、Wukong、HiFormer、DHENといった既存のアーキテクチャは、しばしば準最適設計とハードウェアのアンユース化に苦慮し、実用的スケーラビリティを制限している。従来のTokenMixerアーキテクチャ(RangeMixer論文で紹介)は,自己注意を軽量なトークン混合演算子に置き換えることで,効率と効率性に対処するが,準最適残差パス,勾配の消失,不完全なMoEスペーサ化,制約付きスケーラビリティといった,より深い構成のボトルネックに直面した。本稿では,超大規模レコメンデーションのための体系的に進化したアーキテクチャであるTokenMixer-Largeを提案する。モデル深度が増大しても, 混合反転操作, 層間残留物, 補助損失を導入することにより, 安定な勾配伝播を確保できる。さらに,Sparse Per-token MoEを組み込んで,効率的なパラメータ展開を実現する。 TokenMixer-Largeは、オンライントラフィックとオフライン実験で、パラメータを7ビリオンと15ビリオンに拡大することに成功した。 ByteDanceの複数のシナリオで現在デプロイされているTokenMixer-Largeは、オフラインおよびオンラインのパフォーマンスが大幅に向上し、注文数+1.66\%の増加と1人あたりのプレビューペイメントの2.98\%の増加、広告の2.0\%改善、ライブストリーミングの収益増加+1.4\%を達成した。

論文の概要: TokenMixer-Large: Scaling Up Large Ranking Models in Industrial Recommenders

関連論文リスト