Fugu-MT 論文翻訳(概要): Kunlun: Establishing Scaling Laws for Massive-Scale Recommendation Systems through Unified Architecture Design

論文の概要: Kunlun: Establishing Scaling Laws for Massive-Scale Recommendation Systems through Unified Architecture Design

arxiv url: http://arxiv.org/abs/2602.10016v1
Date: Tue, 10 Feb 2026 17:37:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-11 20:17:43.715037
Title: Kunlun: Establishing Scaling Laws for Massive-Scale Recommendation Systems through Unified Architecture Design
Title（参考訳）: Kunlun: 統一アーキテクチャ設計による大規模レコメンデーションシステムのスケーリング法則の確立
Authors: Bojian Hou, Xiaolong Liu, Xiaoyi Liu, Jiaqi Xu, Yasmine Badr, Mengyue Hang, Sudhanshu Chanpuriya, Junqing Zhou, Yuhang Yang, Han Xu, Qiuling Suo, Laming Chen, Yuxi Hu, Jiasheng Zhang, Huaqing Xiong, Yuzhen Huang, Chao Chen, Yue Dong, Yi Yang, Shuo Chang, Xiaorui Gan, Wenlin Chen, Santanu Kolay, Darren Liu, Jade Nie, Chunzhi Yang, Jiyan Yang, Huayu Li,
Abstract要約: モデル効率とリソース割り当てを改善するスケーラブルなアーキテクチャであるKunlunを紹介します。 Kunlunは現在、主要なMeta Adsモデルにデプロイされており、運用上の大きな影響を与えている。
参考スコア（独自算出の注目度）: 39.56881153682311
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deriving predictable scaling laws that govern the relationship between model performance and computational investment is crucial for designing and allocating resources in massive-scale recommendation systems. While such laws are established for large language models, they remain challenging for recommendation systems, especially those processing both user history and context features. We identify poor scaling efficiency as the main barrier to predictable power-law scaling, stemming from inefficient modules with low Model FLOPs Utilization (MFU) and suboptimal resource allocation. We introduce Kunlun, a scalable architecture that systematically improves model efficiency and resource allocation. Our low-level optimizations include Generalized Dot-Product Attention (GDPA), Hierarchical Seed Pooling (HSP), and Sliding Window Attention. Our high-level innovations feature Computation Skip (CompSkip) and Event-level Personalization. These advances increase MFU from 17% to 37% on NVIDIA B200 GPUs and double scaling efficiency over state-of-the-art methods. Kunlun is now deployed in major Meta Ads models, delivering significant production impact.
Abstract（参考訳）: 大規模レコメンデーションシステムにおいて、モデル性能と計算投資の関係を規定する予測可能なスケーリング法則の導出は、資源の設計と割り当てに不可欠である。このような法則は大規模言語モデルに対して確立されているが、推奨システム、特にユーザ履歴とコンテキストの特徴の両方を処理するシステムには依然として課題がある。我々は,モデルFLOP(MFU)の低い非効率モジュールと,リソース割り当てが最適でないことに起因する,予測可能なパワーロースケーリングの主要な障壁として,スケーリング効率が低いことを確認した。モデル効率とリソース割り当てを体系的に改善するスケーラブルなアーキテクチャであるKunlunを紹介します。当社の低レベル最適化には、Generalized Dot-Product Attention (GDPA), Hierarchical Seed Pooling (HSP), Sliding Window Attentionなどがあります。私たちのハイレベルなイノベーションには、Computation Skip(CompSkip)とイベントレベルのパーソナライゼーションがあります。これらの進歩により、NVIDIA B200 GPU上ではMFUが17%から37%に増加し、最先端のメソッドよりもスケーリング効率が2倍になった。 Kunlunは現在、主要なMeta Adsモデルにデプロイされており、運用上の大きな影響を与えている。

論文の概要: Kunlun: Establishing Scaling Laws for Massive-Scale Recommendation Systems through Unified Architecture Design

関連論文リスト