FuXi-Linear: Unleashing the Power of Linear Attention in Long-term Time-aware Sequential Recommendation
- URL: http://arxiv.org/abs/2602.23671v1
- Date: Fri, 27 Feb 2026 04:38:28 GMT
- Title: FuXi-Linear: Unleashing the Power of Linear Attention in Long-term Time-aware Sequential Recommendation
- Authors: Yufei Ye, Wei Guo, Hao Wang, Luankang Zhang, Heng Chang, Hong Zhu, Yuyang Ye, Yong Liu, Defu Lian, Enhong Chen,
- Abstract summary: FuXi-Linear is a linear-complexity model designed for efficient long-sequence recommendation.<n>Our approach introduces two key components: (1) a Temporal Retention Channel that independently computes periodic attention weights using temporal data, preventing crosstalk between temporal and semantic signals; and (2) a Linear Positional Channel that integrates positional information through learnable kernels within linear complexity.
- Score: 86.55349738440087
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern recommendation systems primarily rely on attention mechanisms with quadratic complexity, which limits their ability to handle long user sequences and slows down inference. While linear attention is a promising alternative, existing research faces three critical challenges: (1) temporal signals are often overlooked or integrated via naive coupling that causes mutual interference between temporal and semantic signals while neglecting behavioral periodicity; (2) insufficient positional information provided by existing linear frameworks; and (3) a primary focus on short sequences and shallow architectures. To address these issues, we propose FuXi-Linear, a linear-complexity model designed for efficient long-sequence recommendation. Our approach introduces two key components: (1) a Temporal Retention Channel that independently computes periodic attention weights using temporal data, preventing crosstalk between temporal and semantic signals; (2) a Linear Positional Channel that integrates positional information through learnable kernels within linear complexity. Moreover, we demonstrate that FuXi-Linear exhibits a robust power-law scaling property at a thousand-length scale, a characteristic largely unexplored in prior linear recommendation studies. Extensive experiments on sequences of several thousand tokens demonstrate that FuXi-Linear outperforms state-of-the-art models in recommendation quality, while achieving up to 10$\times$ speedup in the prefill stage and up to 21$\times$ speedup in the decode stage compared to competitive baselines. Our code has been released in a public repository https://github.com/USTC-StarTeam/fuxi-linear.
Related papers
- HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation [5.1321456889159425]
HyTRec is a model featuring a Hybrid Attention architecture that decouples long-term stable preferences from short-term intent spikes.<n>Our approach restores precise retrieval capabilities within industrial-scale contexts involving ten thousand interactions.<n> Empirical results on industrial-scale datasets confirm the superiority that our model maintains linear inference speed and outperforms strong baselines.
arXiv Detail & Related papers (2026-02-20T15:11:40Z) - GEMs: Breaking the Long-Sequence Barrier in Generative Recommendation with a Multi-Stream Decoder [54.64137490632567]
We propose a novel and unified framework designed to capture users' sequences from long-term history.<n>Generative Multi-streamers ( GEMs) break user sequences into three streams.<n>Extensive experiments on large-scale industrial datasets demonstrate that GEMs significantly outperforms state-the-art methods in recommendation accuracy.
arXiv Detail & Related papers (2026-02-14T06:42:56Z) - Higher-order Linear Attention [59.92962330635185]
quadratic cost of scaled dot-product attention is a central obstacle to scaling autoregressive language models to long contexts.<n>We introduce Higher-order Linear Attention (HLA), a causal, streaming mechanism that realizes higher interactions via compact prefix sufficient statistics.
arXiv Detail & Related papers (2025-10-31T07:54:37Z) - FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential Recommendation [57.577843653775]
We propose textbfFindRec (textbfFlexible unified textbfinformation textbfdisentanglement for multi-modal sequential textbfRecommendation)<n>A Stein kernel-based Integrated Information Coordination Module (IICM) theoretically guarantees distribution consistency between multimodal features and ID streams.<n>A cross-modal expert routing mechanism that adaptively filters and combines multimodal features based on their contextual relevance.
arXiv Detail & Related papers (2025-07-07T04:09:45Z) - Deep Time Warping for Multiple Time Series Alignment [0.0]
Time Series Alignment is a critical task in signal processing with numerous real-world applications.<n>This paper introduces a novel approach for Multiple Time Series Alignment leveraging Deep Learning techniques.
arXiv Detail & Related papers (2025-02-22T18:55:51Z) - Fast Second-Order Online Kernel Learning through Incremental Matrix Sketching and Decomposition [44.61147231796296]
Online Learning (OKL) has attracted considerable research interest due to its promising predictive performance in streaming environments.<n>Existing second-order OKL approaches suffer from at least quadratic time complexity with respect to the pre-set budget.<n>We propose FORKS, a fast incremental matrix sketching and decomposition approach tailored for second-order OKL.
arXiv Detail & Related papers (2024-10-15T02:07:48Z) - Long-Sequence Recommendation Models Need Decoupled Embeddings [49.410906935283585]
We identify and characterize a neglected deficiency in existing long-sequence recommendation models.<n>A single set of embeddings struggles with learning both attention and representation, leading to interference between these two processes.<n>We propose the Decoupled Attention and Representation Embeddings (DARE) model, where two distinct embedding tables are learned separately to fully decouple attention and representation.
arXiv Detail & Related papers (2024-10-03T15:45:15Z) - Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences [60.489682735061415]
We propose CHELA, which replaces state space models with short-long convolutions and implements linear attention in a divide-and-conquer manner.
Our experiments on the Long Range Arena benchmark and language modeling tasks demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2024-06-12T12:12:38Z) - Sketching as a Tool for Understanding and Accelerating Self-attention
for Long Sequences [52.6022911513076]
Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention modules.
We propose Linformer and Informer to reduce the quadratic complexity to linear (modulo logarithmic factors) via low-dimensional projection and row selection.
Based on the theoretical analysis, we propose Skeinformer to accelerate self-attention and further improve the accuracy of matrix approximation to self-attention.
arXiv Detail & Related papers (2021-12-10T06:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.