Related papers: Gated Rotary-Enhanced Linear Attention for Long-term Sequential Recommendation

Gated Rotary-Enhanced Linear Attention for Long-term Sequential Recommendation

URL: http://arxiv.org/abs/2506.13315v2
Date: Sun, 02 Nov 2025 13:29:58 GMT
Title: Gated Rotary-Enhanced Linear Attention for Long-term Sequential Recommendation
Authors: Juntao Hu, Wei Zhou, Huayi Shen, Xiao Du, Jie Liao, Min Gao, Jun Zeng, Junhao Wen,
Abstract summary: We propose a long-term sequential Recommendation model with Gated Rotary Enhanced Linear Attention (RecGRELA)<n> Specifically, we propose a Rotary-Enhanced Linear Attention (RELA) module to efficiently model long-range dependency.<n>We also introduce a SiLU-based Gated mechanism for RELA to help the model tell if a user behavior shows a short-term, local interest or a real change in their long-term tastes.
Score: 14.581838243440922
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In Sequential Recommendation Systems (SRSs), Transformer models have demonstrated remarkable performance but face computational and memory cost challenges, especially when modeling long-term user behavior sequences. Due to its quadratic complexity, the dot-product attention mechanism in Transformers becomes expensive for processing long sequences. By approximating the dot-product attention using elaborate mapping functions, linear attention provides a more efficient option with linear complexity. However, existing linear attention methods face three limitations: 1) they often use learnable position encodings, which incur extra computational costs in long-term sequence scenarios, 2) they may not sufficiently account for user's fine-grained local preferences (short-lived burst of interest), and 3) they try to capture some temporary activities, but often confuse these with stable and long-term interests. This can result in unclear or less effective recommendations. To remedy these drawbacks, we propose a long-term sequential Recommendation model with Gated Rotary Enhanced Linear Attention (RecGRELA). Specifically, we first propose a Rotary-Enhanced Linear Attention (RELA) module to efficiently model long-range dependency within the user's historical information using rotary position encodings. Then, we introduce a local short operation to add the local preferences of interactions and show the theoretical insight. We further introduce a SiLU-based Gated mechanism for RELA (GRELA) to help the model tell if a user behavior shows a short-term, local interest or a real change in their long-term tastes. Experimental results on four public benchmark datasets show that our RecGRELA achieves state-of-the-art performance compared with existing SRSs based on Recurrent Neural Networks, Transformer, and Mamba while keeping low memory overhead.

Related papers

HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation [5.1321456889159425]
HyTRec is a model featuring a Hybrid Attention architecture that decouples long-term stable preferences from short-term intent spikes.<n>Our approach restores precise retrieval capabilities within industrial-scale contexts involving ten thousand interactions.<n> Empirical results on industrial-scale datasets confirm the superiority that our model maintains linear inference speed and outperforms strong baselines.
arXiv Detail & Related papers (2026-02-20T15:11:40Z)
GEMs: Breaking the Long-Sequence Barrier in Generative Recommendation with a Multi-Stream Decoder [54.64137490632567]
We propose a novel and unified framework designed to capture users' sequences from long-term history.<n>Generative Multi-streamers ( GEMs) break user sequences into three streams.<n>Extensive experiments on large-scale industrial datasets demonstrate that GEMs significantly outperforms state-the-art methods in recommendation accuracy.
arXiv Detail & Related papers (2026-02-14T06:42:56Z)
Unleashing the Potential of Sparse Attention on Long-term Behaviors for CTR Prediction [17.78352301235849]
We propose SparseCTR, an efficient and effective model specifically designed for long-term behaviors of users.<n>Based on these chunks, we propose a three-branch sparse self-attention mechanism to jointly identify users' global interests.<n>We show that SparseCTR not only improves efficiency but also outperforms state-of-the-art methods.
arXiv Detail & Related papers (2026-01-25T13:39:26Z)
BlossomRec: Block-level Fused Sparse Attention Mechanism for Sequential Recommendations [29.069570226262073]
Transformer structures have been widely used in sequential recommender systems (SRS)<n>BlossomRec models both long-term and short-term user interests through attention to achieve stable performance across sequences of varying lengths.
arXiv Detail & Related papers (2025-12-15T14:23:57Z)
When Relevance Meets Novelty: Dual-Stable Periodic Optimization for Exploratory Recommendation [6.663356205396985]
Large language models (LLMs) demonstrate potential with their diverse content generation capabilities.<n>Existing LLM-enhanced dual-model frameworks face two major limitations.<n>First, they overlook long-term preferences driven by group identity, leading to biased interest modeling.<n>Second, they suffer from static optimization flaws, as a one-time alignment process fails to leverage incremental user data for closed-loop optimization.
arXiv Detail & Related papers (2025-08-01T09:10:56Z)
LinRec: Linear Attention Mechanism for Long-term Sequential Recommender Systems [36.470868461685896]
We propose a novel L2-Normalized Linear Attention for the Transformer-based Sequential Recommender Systems (LinRec) We show that LinRec possesses linear complexity while preserving the property of attention mechanisms. Experiments are conducted based on two public benchmark datasets, demonstrating that the combination of LinRec and Transformer models achieves comparable or even superior performance.
arXiv Detail & Related papers (2024-11-03T11:56:00Z)
Long-Sequence Recommendation Models Need Decoupled Embeddings [49.410906935283585]
We identify and characterize a neglected deficiency in existing long-sequence recommendation models.<n>A single set of embeddings struggles with learning both attention and representation, leading to interference between these two processes.<n>We propose the Decoupled Attention and Representation Embeddings (DARE) model, where two distinct embedding tables are learned separately to fully decouple attention and representation.
arXiv Detail & Related papers (2024-10-03T15:45:15Z)
SIGMA: Selective Gated Mamba for Sequential Recommendation [56.85338055215429]
Mamba, a recent advancement, has exhibited exceptional performance in time series prediction.<n>We introduce a new framework named Selective Gated Mamba ( SIGMA) for Sequential Recommendation.<n>Our results indicate that SIGMA outperforms current models on five real-world datasets.
arXiv Detail & Related papers (2024-08-21T09:12:59Z)
MaTrRec: Uniting Mamba and Transformer for Sequential Recommendation [6.74321828540424]
Sequential recommendation systems aim to provide personalized recommendations by analyzing dynamic preferences and dependencies within user behavior sequences. Inspired by the State Space Model (SSM)representative model, Mamba, we find that Mamba's recommendation effectiveness is limited in short interaction sequences. We propose a new model, MaTrRec, which combines the strengths of Mamba and Transformer.
arXiv Detail & Related papers (2024-07-27T12:07:46Z)
Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences [60.489682735061415]
We propose CHELA, which replaces state space models with short-long convolutions and implements linear attention in a divide-and-conquer manner. Our experiments on the Long Range Arena benchmark and language modeling tasks demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2024-06-12T12:12:38Z)
Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences [52.6022911513076]
Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention modules. We propose Linformer and Informer to reduce the quadratic complexity to linear (modulo logarithmic factors) via low-dimensional projection and row selection. Based on the theoretical analysis, we propose Skeinformer to accelerate self-attention and further improve the accuracy of matrix approximation to self-attention.
arXiv Detail & Related papers (2021-12-10T06:58:05Z)
Dynamic Memory based Attention Network for Sequential Recommendation [79.5901228623551]
We propose a novel long sequential recommendation model called Dynamic Memory-based Attention Network (DMAN) It segments the overall long behavior sequence into a series of sub-sequences, then trains the model and maintains a set of memory blocks to preserve long-term interests of users. Based on the dynamic memory, the user's short-term and long-term interests can be explicitly extracted and combined for efficient joint recommendation.
arXiv Detail & Related papers (2021-02-18T11:08:54Z)
Learning Transferrable Parameters for Long-tailed Sequential User Behavior Modeling [70.64257515361972]
We argue that focusing on tail users could bring more benefits and address the long tails issue. Specifically, we propose a gradient alignment and adopt an adversarial training scheme to facilitate knowledge transfer from the head to the tail.
arXiv Detail & Related papers (2020-10-22T03:12:02Z)
Sequential Recommender via Time-aware Attentive Memory Network [67.26862011527986]
We propose a temporal gating methodology to improve attention mechanism and recurrent units. We also propose a Multi-hop Time-aware Attentive Memory network to integrate long-term and short-term preferences. Our approach is scalable for candidate retrieval tasks and can be viewed as a non-linear generalization of latent factorization for dot-product based Top-K recommendation.
arXiv Detail & Related papers (2020-05-18T11:29:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.