BlossomRec: Block-level Fused Sparse Attention Mechanism for Sequential Recommendations
- URL: http://arxiv.org/abs/2512.13368v1
- Date: Mon, 15 Dec 2025 14:23:57 GMT
- Title: BlossomRec: Block-level Fused Sparse Attention Mechanism for Sequential Recommendations
- Authors: Mengyang Ma, Xiaopeng Li, Wanyu Wang, Zhaocheng Du, Jingtong Gao, Pengyue Jia, Yuyang Ye, Yiqi Wang, Yunpeng Weng, Weihong Luo, Xiao Han, Xiangyu Zhao,
- Abstract summary: Transformer structures have been widely used in sequential recommender systems (SRS)<n>BlossomRec models both long-term and short-term user interests through attention to achieve stable performance across sequences of varying lengths.
- Score: 29.069570226262073
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer structures have been widely used in sequential recommender systems (SRS). However, as user interaction histories increase, computational time and memory requirements also grow. This is mainly caused by the standard attention mechanism. Although there exist many methods employing efficient attention and SSM-based models, these approaches struggle to effectively model long sequences and may exhibit unstable performance on short sequences. To address these challenges, we design a sparse attention mechanism, BlossomRec, which models both long-term and short-term user interests through attention computation to achieve stable performance across sequences of varying lengths. Specifically, we categorize user interests in recommendation systems into long-term and short-term interests, and compute them using two distinct sparse attention patterns, with the results combined through a learnable gated output. Theoretically, it significantly reduces the number of interactions participating in attention computation. Extensive experiments on four public datasets demonstrate that BlossomRec, when integrated with state-of-the-art Transformer-based models, achieves comparable or even superior performance while significantly reducing memory usage, providing strong evidence of BlossomRec's efficiency and effectiveness.The code is available at https://github.com/ronineume/BlossomRec.
Related papers
- GEMs: Breaking the Long-Sequence Barrier in Generative Recommendation with a Multi-Stream Decoder [54.64137490632567]
We propose a novel and unified framework designed to capture users' sequences from long-term history.<n>Generative Multi-streamers ( GEMs) break user sequences into three streams.<n>Extensive experiments on large-scale industrial datasets demonstrate that GEMs significantly outperforms state-the-art methods in recommendation accuracy.
arXiv Detail & Related papers (2026-02-14T06:42:56Z) - Recurrent Preference Memory for Efficient Long-Sequence Generative Recommendation [27.325586037888]
We introduce Rec2PM, a framework that compresses long user interaction histories into compact Preference Memory tokens.<n>Experiments show that Rec2PM significantly reduces inference latency and memory footprint while achieving superior accuracy compared to full-sequence models.
arXiv Detail & Related papers (2026-02-12T05:51:52Z) - Unleashing the Potential of Sparse Attention on Long-term Behaviors for CTR Prediction [17.78352301235849]
We propose SparseCTR, an efficient and effective model specifically designed for long-term behaviors of users.<n>Based on these chunks, we propose a three-branch sparse self-attention mechanism to jointly identify users' global interests.<n>We show that SparseCTR not only improves efficiency but also outperforms state-of-the-art methods.
arXiv Detail & Related papers (2026-01-25T13:39:26Z) - InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation [56.694702609077495]
Long-sequence processing is a critical capability for modern large language models.<n>InfLLM-V2 is a trainable sparse attention framework that seamlessly adapts models from short to long sequences.<n>In experiments, InfLLM-V2 is 4$times$ faster than dense attention while retaining 98.1% and 99.7% of the performance.
arXiv Detail & Related papers (2025-09-29T12:08:33Z) - Gated Rotary-Enhanced Linear Attention for Long-term Sequential Recommendation [14.581838243440922]
We propose a long-term sequential Recommendation model with Gated Rotary Enhanced Linear Attention (RecGRELA)<n> Specifically, we propose a Rotary-Enhanced Linear Attention (RELA) module to efficiently model long-range dependency.<n>We also introduce a SiLU-based Gated mechanism for RELA to help the model tell if a user behavior shows a short-term, local interest or a real change in their long-term tastes.
arXiv Detail & Related papers (2025-06-16T09:56:10Z) - Breaking the Context Bottleneck on Long Time Series Forecasting [10.715175460720403]
Long-term time-series forecasting is essential for planning and decision-making in economics, energy, and transportation.<n>Recent advancements have enhanced the efficiency of these models, but the challenge of effectively leveraging longer sequences persists.<n>We propose the Logsparse Decomposable Multiscaling (LDM) framework for the efficient and effective processing of long sequences.
arXiv Detail & Related papers (2024-12-21T10:29:34Z) - ELASTIC: Efficient Linear Attention for Sequential Interest Compression [5.689306819772134]
State-of-the-art sequential recommendation models heavily rely on transformer's attention mechanism.<n>We propose ELASTIC, an Efficient Linear Attention for SequenTial Interest Compression.<n>We conduct extensive experiments on various public datasets and compare it with several strong sequential recommenders.
arXiv Detail & Related papers (2024-08-18T06:41:46Z) - Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers [58.5711048151424]
We introduce SPARSEK Attention, a novel sparse attention mechanism designed to overcome computational and memory obstacles.
Our approach integrates a scoring network and a differentiable top-k mask operator, SPARSEK, to select a constant number of KV pairs for each query.
Experimental results reveal that SPARSEK Attention outperforms previous sparse attention methods.
arXiv Detail & Related papers (2024-06-24T15:55:59Z) - LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory [63.41820940103348]
Self-attention mechanism's computational cost limits its practicality for long sequences.
We propose a new method called LongVQ to compress the global abstraction as a length-fixed codebook.
LongVQ effectively maintains dynamic global and local patterns, which helps to complement the lack of long-range dependency issues.
arXiv Detail & Related papers (2024-04-17T08:26:34Z) - Learning Sequence Representations by Non-local Recurrent Neural Memory [61.65105481899744]
We propose a Non-local Recurrent Neural Memory (NRNM) for supervised sequence representation learning.
Our model is able to capture long-range dependencies and latent high-level features can be distilled by our model.
Our model compares favorably against other state-of-the-art methods specifically designed for each of these sequence applications.
arXiv Detail & Related papers (2022-07-20T07:26:15Z) - Dynamic Memory based Attention Network for Sequential Recommendation [79.5901228623551]
We propose a novel long sequential recommendation model called Dynamic Memory-based Attention Network (DMAN)
It segments the overall long behavior sequence into a series of sub-sequences, then trains the model and maintains a set of memory blocks to preserve long-term interests of users.
Based on the dynamic memory, the user's short-term and long-term interests can be explicitly extracted and combined for efficient joint recommendation.
arXiv Detail & Related papers (2021-02-18T11:08:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.