Unleashing the Potential of Sparse Attention on Long-term Behaviors for CTR Prediction
- URL: http://arxiv.org/abs/2601.17836v1
- Date: Sun, 25 Jan 2026 13:39:26 GMT
- Title: Unleashing the Potential of Sparse Attention on Long-term Behaviors for CTR Prediction
- Authors: Weijiang Lai, Beihong Jin, Di Zhang, Siru Chen, Jiongyan Zhang, Yuhang Gou, Jian Dong, Xingxing Wang,
- Abstract summary: We propose SparseCTR, an efficient and effective model specifically designed for long-term behaviors of users.<n>Based on these chunks, we propose a three-branch sparse self-attention mechanism to jointly identify users' global interests.<n>We show that SparseCTR not only improves efficiency but also outperforms state-of-the-art methods.
- Score: 17.78352301235849
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, the success of large language models (LLMs) has driven the exploration of scaling laws in recommender systems. However, models that demonstrate scaling laws are actually challenging to deploy in industrial settings for modeling long sequences of user behaviors, due to the high computational complexity of the standard self-attention mechanism. Despite various sparse self-attention mechanisms proposed in other fields, they are not fully suited for recommendation scenarios. This is because user behaviors exhibit personalization and temporal characteristics: different users have distinct behavior patterns, and these patterns change over time, with data from these users differing significantly from data in other fields in terms of distribution. To address these challenges, we propose SparseCTR, an efficient and effective model specifically designed for long-term behaviors of users. To be precise, we first segment behavior sequences into chunks in a personalized manner to avoid separating continuous behaviors and enable parallel processing of sequences. Based on these chunks, we propose a three-branch sparse self-attention mechanism to jointly identify users' global interests, interest transitions, and short-term interests. Furthermore, we design a composite relative temporal encoding via learnable, head-specific bias coefficients, better capturing sequential and periodic relationships among user behaviors. Extensive experimental results show that SparseCTR not only improves efficiency but also outperforms state-of-the-art methods. More importantly, it exhibits an obvious scaling law phenomenon, maintaining performance improvements across three orders of magnitude in FLOPs. In online A/B testing, SparseCTR increased CTR by 1.72\% and CPM by 1.41\%. Our source code is available at https://github.com/laiweijiang/SparseCTR.
Related papers
- GEMs: Breaking the Long-Sequence Barrier in Generative Recommendation with a Multi-Stream Decoder [54.64137490632567]
We propose a novel and unified framework designed to capture users' sequences from long-term history.<n>Generative Multi-streamers ( GEMs) break user sequences into three streams.<n>Extensive experiments on large-scale industrial datasets demonstrate that GEMs significantly outperforms state-the-art methods in recommendation accuracy.
arXiv Detail & Related papers (2026-02-14T06:42:56Z) - GenCI: Generative Modeling of User Interest Shift via Cohort-based Intent Learning for CTR Prediction [84.0125708499372]
We propose a generative user intent framework to model user preferences for click-through rate (CTR) prediction.<n>The framework first employs a generative model, trained with a next-item prediction objective, to proactively produce candidate interest cohorts.<n>A hierarchical candidate-aware network then injects this rich contextual signal into the ranking stage, refining them with cross-attention to align with both user history and the target item.
arXiv Detail & Related papers (2026-01-26T08:15:04Z) - BlossomRec: Block-level Fused Sparse Attention Mechanism for Sequential Recommendations [29.069570226262073]
Transformer structures have been widely used in sequential recommender systems (SRS)<n>BlossomRec models both long-term and short-term user interests through attention to achieve stable performance across sequences of varying lengths.
arXiv Detail & Related papers (2025-12-15T14:23:57Z) - ENCODE: Breaking the Trade-Off Between Performance and Efficiency in Long-Term User Behavior Modeling [12.963611514800656]
We propose an efficient two-stage long-term sequence modeling approach, named as EfficieNt Clustering based twO-stage interest moDEling (ENCODE)<n>In the offline extraction stage, ENCODE clusters the entire behavior sequence and extracts accurate interests.<n>While in the online inference stage, ENCODE takes the off-the-shelf user interests to predict the associations with target items.
arXiv Detail & Related papers (2025-08-19T06:58:21Z) - Multi-granularity Interest Retrieval and Refinement Network for Long-Term User Behavior Modeling in CTR Prediction [68.90783662117936]
Click-through Rate (CTR) prediction is crucial for online personalization platforms.<n>Recent advancements have shown that modeling rich user behaviors can significantly improve the performance of CTR prediction.<n>We propose Multi-granularity Interest Retrieval and Refinement Network (MIRRN)
arXiv Detail & Related papers (2024-11-22T15:29:05Z) - Long-Sequence Recommendation Models Need Decoupled Embeddings [49.410906935283585]
We identify and characterize a neglected deficiency in existing long-sequence recommendation models.<n>A single set of embeddings struggles with learning both attention and representation, leading to interference between these two processes.<n>We propose the Decoupled Attention and Representation Embeddings (DARE) model, where two distinct embedding tables are learned separately to fully decouple attention and representation.
arXiv Detail & Related papers (2024-10-03T15:45:15Z) - Personalized Behavior-Aware Transformer for Multi-Behavior Sequential
Recommendation [25.400756652696895]
We propose a Personalized Behavior-Aware Transformer framework (PBAT) for Multi-Behavior Sequential Recommendation (MBSR) problem.
PBAT develops a personalized behavior pattern generator in the representation layer, which extracts dynamic and discriminative behavior patterns for sequential learning.
We conduct experiments on three benchmark datasets and the results demonstrate the effectiveness and interpretability of our framework.
arXiv Detail & Related papers (2024-02-22T12:03:21Z) - TBIN: Modeling Long Textual Behavior Data for CTR Prediction [15.056265935931377]
Click-through rate (CTR) prediction plays a pivotal role in the success of recommendations.
Inspired by the recent thriving of language models (LMs), a surge of works improve prediction by organizing user behavior data in a textbftextual format.
While promising, these works have to truncate the textual data to reduce the quadratic computational overhead of self-attention in LMs.
In this paper, we propose a textbfTextual textbfBehavior-based textbfInterest Chunking textbfN
arXiv Detail & Related papers (2023-08-09T03:48:41Z) - Meta-Wrapper: Differentiable Wrapping Operator for User Interest
Selection in CTR Prediction [97.99938802797377]
Click-through rate (CTR) prediction, whose goal is to predict the probability of the user to click on an item, has become increasingly significant in recommender systems.
Recent deep learning models with the ability to automatically extract the user interest from his/her behaviors have achieved great success.
We propose a novel approach under the framework of the wrapper method, which is named Meta-Wrapper.
arXiv Detail & Related papers (2022-06-28T03:28:15Z) - Contrastive Self-supervised Sequential Recommendation with Robust
Augmentation [101.25762166231904]
Sequential Recommendationdescribes a set of techniques to model dynamic user behavior in order to predict future interactions in sequential user data.
Old and new issues remain, including data-sparsity and noisy data.
We propose Contrastive Self-Supervised Learning for sequential Recommendation (CoSeRec)
arXiv Detail & Related papers (2021-08-14T07:15:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.