Fugu-MT 論文翻訳(概要): Unleashing the Potential of Sparse Attention on Long-term Behaviors for CTR Prediction

論文の概要: Unleashing the Potential of Sparse Attention on Long-term Behaviors for CTR Prediction

arxiv url: http://arxiv.org/abs/2601.17836v1
Date: Sun, 25 Jan 2026 13:39:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-27 15:23:08.427498
Title: Unleashing the Potential of Sparse Attention on Long-term Behaviors for CTR Prediction
Title（参考訳）: CTR予測のための長期行動に対するスパース注意の可能性
Authors: Weijiang Lai, Beihong Jin, Di Zhang, Siru Chen, Jiongyan Zhang, Yuhang Gou, Jian Dong, Xingxing Wang,
Abstract要約: 本研究では,ユーザの長期行動に特化して設計された効率的かつ効果的なモデルであるSparseCTRを提案する。これらのチャンクに基づいて,ユーザのグローバルな関心を共同で識別する3枝のスパース自己認識機構を提案する。 SparseCTRは効率を向上するだけでなく、最先端の手法よりも優れていることを示す。
参考スコア（独自算出の注目度）: 17.78352301235849
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In recent years, the success of large language models (LLMs) has driven the exploration of scaling laws in recommender systems. However, models that demonstrate scaling laws are actually challenging to deploy in industrial settings for modeling long sequences of user behaviors, due to the high computational complexity of the standard self-attention mechanism. Despite various sparse self-attention mechanisms proposed in other fields, they are not fully suited for recommendation scenarios. This is because user behaviors exhibit personalization and temporal characteristics: different users have distinct behavior patterns, and these patterns change over time, with data from these users differing significantly from data in other fields in terms of distribution. To address these challenges, we propose SparseCTR, an efficient and effective model specifically designed for long-term behaviors of users. To be precise, we first segment behavior sequences into chunks in a personalized manner to avoid separating continuous behaviors and enable parallel processing of sequences. Based on these chunks, we propose a three-branch sparse self-attention mechanism to jointly identify users' global interests, interest transitions, and short-term interests. Furthermore, we design a composite relative temporal encoding via learnable, head-specific bias coefficients, better capturing sequential and periodic relationships among user behaviors. Extensive experimental results show that SparseCTR not only improves efficiency but also outperforms state-of-the-art methods. More importantly, it exhibits an obvious scaling law phenomenon, maintaining performance improvements across three orders of magnitude in FLOPs. In online A/B testing, SparseCTR increased CTR by 1.72\% and CPM by 1.41\%. Our source code is available at https://github.com/laiweijiang/SparseCTR.
Abstract（参考訳）: 近年,大規模言語モデル (LLM) の成功により,レコメンダシステムにおけるスケーリング法則の探求が進められている。しかしながら、スケーリング法則を実証するモデルは、標準的な自己認識機構の計算量が多いため、実際には、ユーザ行動の長いシーケンスをモデル化するための産業環境での展開が困難である。他の分野では多様な自己注意機構が提案されているが、推奨シナリオには適していない。これは、ユーザ行動がパーソナライズと時間的特性を示すためである:異なるユーザが異なる行動パターンを持ち、これらのパターンは時間とともに変化する。これらの課題に対処するために,ユーザの長期行動に特化して設計された効率的かつ効果的なモデルであるSparseCTRを提案する。正確に言うと、動作シーケンスをパーソナライズされた方法でチャンクに分割し、連続的な動作の分離を回避し、シーケンスの並列処理を可能にする。これらのチャンクに基づいて,ユーザのグローバルな関心,関心の推移,短期的な関心を共同で識別する,スパンチス・スパンチ・セルフアテンション機構を提案する。さらに,学習可能な,頭部特異的バイアス係数による複合的時間的エンコーディングを設計し,ユーザ行動間の逐次的および周期的関係をよりよく捉える。実験結果から, SparseCTRは効率を向上するだけでなく, 最先端の手法よりも優れることがわかった。さらに重要なのは、FLOPの3桁にわたるパフォーマンス向上を保ちながら、明らかなスケーリング法則の現象を示すことだ。オンラインA/Bテストでは、SparseCTRはCTRを1.72 %、CPMを1.41 %増加させた。ソースコードはhttps://github.com/laiweijiang/SparseCTRで公開しています。

論文の概要: Unleashing the Potential of Sparse Attention on Long-term Behaviors for CTR Prediction

関連論文リスト