Generative Early Stage Ranking
- URL: http://arxiv.org/abs/2511.21095v1
- Date: Wed, 26 Nov 2025 06:29:18 GMT
- Title: Generative Early Stage Ranking
- Authors: Juhee Hong, Meng Liu, Shengzhi Wang, Xiaoheng Mao, Huihui Cheng, Leon Gao, Christopher Leung, Jin Zhou, Chandra Mouli Sekar, Zhao Zhu, Ruochen Liu, Tuan Trieu, Dawei Sun, Jeet Kanjani, Rui Li, Jing Qian, Xuan Cao, Minjie Fan, Mingze Gao,
- Abstract summary: We propose a Generative Early Stage Ranking (GESR) paradigm to balance effectiveness and efficiency.<n>The GESR paradigm has shown substantial improvements in topline metrics, engagement, and consumption tasks.<n>To the best of our knowledge, this marks the first successful deployment of full target-aware attention sequence modeling within an ESR stage at such a scale.
- Score: 14.15517442047903
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large-scale recommendations commonly adopt a multi-stage cascading ranking system paradigm to balance effectiveness and efficiency. Early Stage Ranking (ESR) systems utilize the "user-item decoupling" approach, where independently learned user and item representations are only combined at the final layer. While efficient, this design is limited in effectiveness, as it struggles to capture fine-grained user-item affinities and cross-signals. To address these, we propose the Generative Early Stage Ranking (GESR) paradigm, introducing the Mixture of Attention (MoA) module which leverages diverse attention mechanisms to bridge the effectiveness gap: the Hard Matching Attention (HMA) module encodes explicit cross-signals by computing raw match counts between user and item features; the Target-Aware Self Attention module generates target-aware user representations conditioned on the item, enabling more personalized learning; and the Cross Attention modules facilitate early and more enriched interactions between user-item features. MoA's specialized attention encodings are further refined in the final layer through a Multi-Logit Parameterized Gating (MLPG) module, which integrates the newly learned embeddings via gating and produces secondary logits that are fused with the primary logit. To address the efficiency and latency challenges, we have introduced a comprehensive suite of optimization techniques. These span from custom kernels that maximize the capabilities of the latest hardware to efficient serving solutions powered by caching mechanisms. The proposed GESR paradigm has shown substantial improvements in topline metrics, engagement, and consumption tasks, as validated by both offline and online experiments. To the best of our knowledge, this marks the first successful deployment of full target-aware attention sequence modeling within an ESR stage at such a scale.
Related papers
- Beyond the Flat Sequence: Hierarchical and Preference-Aware Generative Recommendations [35.58864660038236]
We propose a novel framework named HPGR (Hierarchical and Preference-aware Generative Recommender)<n>First, a structure-aware pre-training stage employs a session-based Masked Item Modeling objective to learn a hierarchically-informed and semantically rich item representation space.<n>Second, a preference-aware fine-tuning stage leverages these powerful representations to implement a Preference-Guided Sparse Attention mechanism.
arXiv Detail & Related papers (2026-03-01T08:15:34Z) - GEMs: Breaking the Long-Sequence Barrier in Generative Recommendation with a Multi-Stream Decoder [54.64137490632567]
We propose a novel and unified framework designed to capture users' sequences from long-term history.<n>Generative Multi-streamers ( GEMs) break user sequences into three streams.<n>Extensive experiments on large-scale industrial datasets demonstrate that GEMs significantly outperforms state-the-art methods in recommendation accuracy.
arXiv Detail & Related papers (2026-02-14T06:42:56Z) - A Learnable Fully Interacted Two-Tower Model for Pre-Ranking System [15.03225449071182]
The two-tower model is widely used in pre-ranking systems due to a good balance between efficiency and effectiveness.<n>A novel architecture named learnable Fully Interacted Two-tower Model (FIT) is proposed, which enables rich information interactions.
arXiv Detail & Related papers (2025-09-16T10:52:03Z) - Multimodal Fusion And Sparse Attention-based Alignment Model for Long Sequential Recommendation [9.086257183699418]
multimodal item sequences and mining multi-grained user interests can bridge the gap between content comprehension and recommendation.<n>We propose MUFASA, a MUltimodal Fusion And Sparse Attention-based Alignment model for long sequential recommendation.<n>Experiments on real-world benchmarks show that MUFASA consistently surpasses state-of-the-art baselines.
arXiv Detail & Related papers (2025-08-13T09:50:44Z) - Semi-supervised Semantic Segmentation with Multi-Constraint Consistency Learning [81.02648336552421]
We propose a Multi-Constraint Consistency Learning approach to facilitate the staged enhancement of the encoder and decoder.<n>Self-adaptive feature masking and noise injection are designed in an instance-specific manner to perturb the features for robust learning of the decoder.<n> Experimental results on Pascal VOC2012 and Cityscapes datasets demonstrate that our proposed MCCL achieves new state-of-the-art performance.
arXiv Detail & Related papers (2025-03-23T03:21:33Z) - DOEI: Dual Optimization of Embedding Information for Attention-Enhanced Class Activation Maps [30.53564087005569]
Weakly supervised semantic segmentation (WSSS) typically utilizes limited semantic annotations to obtain initial Class Activation Maps (CAMs)<n>Due to the inadequate coupling between class activation responses and semantic information in high-dimensional space, the CAM is prone to object co-occurrence or under-activation.<n>We propose DOEI, Dual Optimization of Embedding Information, a novel approach that reconstructs embedding representations through semantic-aware attention weight matrices.
arXiv Detail & Related papers (2025-02-21T19:06:01Z) - Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design [59.00758127310582]
We propose a novel framework Read-ME that transforms pre-trained dense LLMs into smaller MoE models.
Our approach employs activation sparsity to extract experts.
Read-ME outperforms other popular open-source dense models of similar scales.
arXiv Detail & Related papers (2024-10-24T19:48:51Z) - Collaborative Filtering Based on Diffusion Models: Unveiling the Potential of High-Order Connectivity [10.683635786183894]
CF-Diff is a new diffusion model-based collaborative filtering method.
It is capable of making full use of collaborative signals along with multi-hop neighbors.
It achieves remarkable gains up to 7.29% compared to the best competitor.
arXiv Detail & Related papers (2024-04-22T14:49:46Z) - AMMUNet: Multi-Scale Attention Map Merging for Remote Sensing Image Segmentation [4.618389486337933]
We propose AMMUNet, a UNet-based framework that employs multi-scale attention map merging.
The proposed AMMM effectively combines multi-scale attention maps into a unified representation using a fixed mask template.
We show that our approach achieves remarkable mean intersection over union (mIoU) scores of 75.48% on the Vaihingen dataset and an exceptional 77.90% on the Potsdam dataset.
arXiv Detail & Related papers (2024-04-20T15:23:15Z) - Can SAM Boost Video Super-Resolution? [78.29033914169025]
We propose a simple yet effective module -- SAM-guidEd refinEment Module (SEEM)
This light-weight plug-in module is specifically designed to leverage the attention mechanism for the generation of semantic-aware feature.
We apply our SEEM to two representative methods, EDVR and BasicVSR, resulting in consistently improved performance with minimal implementation effort.
arXiv Detail & Related papers (2023-05-11T02:02:53Z) - USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text
Retrieval [115.28586222748478]
Image-Text Retrieval (ITR) aims at searching for the target instances that are semantically relevant to the given query from the other modality.
Existing approaches typically suffer from two major limitations.
arXiv Detail & Related papers (2023-01-17T12:42:58Z) - Learning towards Synchronous Network Memorizability and Generalizability
for Continual Segmentation across Multiple Sites [52.84959869494459]
In clinical practice, a segmentation network is often required to continually learn on a sequential data stream from multiple sites.
Existing methods are usually restricted in either network memorizability on previous sites or generalizability on unseen sites.
This paper aims to tackle the problem of Synchronous Memorizability and Generalizability with a novel proposed SMG-learning framework.
arXiv Detail & Related papers (2022-06-14T13:04:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.