GEMs: Breaking the Long-Sequence Barrier in Generative Recommendation with a Multi-Stream Decoder
- URL: http://arxiv.org/abs/2602.13631v1
- Date: Sat, 14 Feb 2026 06:42:56 GMT
- Title: GEMs: Breaking the Long-Sequence Barrier in Generative Recommendation with a Multi-Stream Decoder
- Authors: Yu Zhou, Chengcheng Guo, Kuo Cai, Ji Liu, Qiang Luo, Ruiming Tang, Han Li, Kun Gai, Guorui Zhou,
- Abstract summary: We propose a novel and unified framework designed to capture users' sequences from long-term history.<n>Generative Multi-streamers ( GEMs) break user sequences into three streams.<n>Extensive experiments on large-scale industrial datasets demonstrate that GEMs significantly outperforms state-the-art methods in recommendation accuracy.
- Score: 54.64137490632567
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While generative recommendations (GR) possess strong sequential reasoning capabilities, they face significant challenges when processing extremely long user behavior sequences: the high computational cost forces practical sequence lengths to be limited, preventing models from capturing users' lifelong interests; meanwhile, the inherent "recency bias" of attention mechanisms further weakens learning from long-term history. To overcome this bottleneck, we propose GEMs (Generative rEcommendation with a Multi-stream decoder), a novel and unified framework designed to break the long-sequence barrier by capturing users' lifelong interaction sequences through a multi-stream perspective. Specifically, GEMs partitions user behaviors into three temporal streams$\unicode{x2014}$Recent, Mid-term, and Lifecycle$\unicode{x2014}$and employs tailored inference schemes for each: a one-stage real-time extractor for immediate dynamics, a lightweight indexer for cross attention to balance accuracy and cost for mid-term sequences, and a two-stage offline-online compression module for lifelong modeling. These streams are integrated via a parameter-free fusion strategy to enable holistic interest representation. Extensive experiments on large-scale industrial datasets demonstrate that GEMs significantly outperforms state-of-the-art methods in recommendation accuracy. Notably, GEMs is the first lifelong GR framework successfully deployed in a high-concurrency industrial environment, achieving superior inference efficiency while processing user sequences of over 100,000 interactions.
Related papers
- FuXi-Linear: Unleashing the Power of Linear Attention in Long-term Time-aware Sequential Recommendation [86.55349738440087]
FuXi-Linear is a linear-complexity model designed for efficient long-sequence recommendation.<n>Our approach introduces two key components: (1) a Temporal Retention Channel that independently computes periodic attention weights using temporal data, preventing crosstalk between temporal and semantic signals; and (2) a Linear Positional Channel that integrates positional information through learnable kernels within linear complexity.
arXiv Detail & Related papers (2026-02-27T04:38:28Z) - Climber-Pilot: A Non-Myopic Generative Recommendation Model Towards Better Instruction-Following [19.550149895505683]
We present Climber-Pilot, a unified generative retrieval framework.<n>We introduce Time-Aware Multi-Item Prediction (TAMIP), a novel training paradigm designed to mitigate inherent myopia in generative retrieval.<n>We also propose Condition-Guided Sparse Attention (CGSA), which incorporates business constraints directly into the generative process via sparse attention.
arXiv Detail & Related papers (2026-02-14T03:46:06Z) - OneLive: Dynamically Unified Generative Framework for Live-Streaming Recommendation [49.95897358060393]
We propose OneLive, a dynamically unified generative recommendation framework tailored for live-streaming scenario.<n>OneLive integrates four key components: (i) A Dynamic Tokenizer that continuously encodes evolving real-time live content fused with behavior signal through residual quantization; (ii) A Time-Aware Gated Attention mechanism that explicitly models temporal dynamics for timely decision making; (iii) An efficient decoder-only generative architecture enhanced with Sequential MTP and QK Norm for stable training and accelerated inference.
arXiv Detail & Related papers (2026-02-09T12:56:39Z) - GLASS: A Generative Recommender for Long-sequence Modeling via SID-Tier and Semantic Search [51.44490997013772]
GLASS is a novel framework that integrates long-term user interests into the generative process via SID-Tier and Semantic Search.<n>We show that GLASS outperforms state-of-the-art baselines in experiments on two large-scale real-world datasets.
arXiv Detail & Related papers (2026-02-05T13:48:33Z) - Length-Adaptive Interest Network for Balancing Long and Short Sequence Modeling in CTR Prediction [50.094751096858204]
LAIN is a plug-and-play framework that incorporates sequence length as a conditioning signal to balance long- and short-sequence modeling.<n>Our work offers a general, efficient, and deployable solution to mitigate length-induced bias in sequential recommendation.
arXiv Detail & Related papers (2026-01-27T03:14:20Z) - Unleashing the Potential of Sparse Attention on Long-term Behaviors for CTR Prediction [17.78352301235849]
We propose SparseCTR, an efficient and effective model specifically designed for long-term behaviors of users.<n>Based on these chunks, we propose a three-branch sparse self-attention mechanism to jointly identify users' global interests.<n>We show that SparseCTR not only improves efficiency but also outperforms state-of-the-art methods.
arXiv Detail & Related papers (2026-01-25T13:39:26Z) - HyFormer: Revisiting the Roles of Sequence Modeling and Feature Interaction in CTR Prediction [8.97787361529607]
This paper presents HyFormer, a unified hybrid transformer architecture that tightly integrates long-sequence modeling and feature interaction into a single backbone.<n>Experiments on billion-scale industrial datasets demonstrate that HyFormer consistently outperforms strong LONGER and RankMixer baselines.
arXiv Detail & Related papers (2026-01-19T02:55:05Z) - Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models [74.15250326312179]
Diffusion Large Language Models offer efficient parallel generation and capable global modeling.<n>The dominant application ofDLLMs is hindered by the need for a statically predefined generation length.<n>We introduce DAEDAL, a novel training-free denoising strategy that enables Dynamic Adaptive Length Expansion.
arXiv Detail & Related papers (2025-08-01T17:56:07Z) - LONGER: Scaling Up Long Sequence Modeling in Industrial Recommenders [23.70714095931094]
Long-sequence optimized traNsformer for GPU-Efficient Recommenders.<n>Longer consistently outperforms strong baselines in offline metrics and online A/B testing.
arXiv Detail & Related papers (2025-05-07T13:54:26Z) - UmambaTSF: A U-shaped Multi-Scale Long-Term Time Series Forecasting Method Using Mamba [7.594115034632109]
We propose UmambaTSF, a novel long-term time series forecasting framework.
It integrates multi-scale feature extraction capabilities of U-shaped encoder-decoder multilayer perceptrons (MLP) with Mamba's long sequence representation.
UmambaTSF achieves state-of-the-art performance and excellent generality on widely used benchmark datasets.
arXiv Detail & Related papers (2024-10-15T04:56:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.