GRank: Towards Target-Aware and Streamlined Industrial Retrieval with a Generate-Rank Framework
- URL: http://arxiv.org/abs/2510.15299v1
- Date: Fri, 17 Oct 2025 04:15:09 GMT
- Title: GRank: Towards Target-Aware and Streamlined Industrial Retrieval with a Generate-Rank Framework
- Authors: Yijia Sun, Shanshan Huang, Zhiyuan Guan, Qiang Luo, Ruiming Tang, Kun Gai, Guorui Zhou,
- Abstract summary: Industrial-scale recommender systems rely on a cascade pipeline in which the retrieval stage must return a high-recall candidate set from billions of items under tight latency.<n>We present GRank, a novel structured-index-free retrieval paradigm that seamlessly unifies target-aware learning with user-centric retrieval.
- Score: 47.25361445845229
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Industrial-scale recommender systems rely on a cascade pipeline in which the retrieval stage must return a high-recall candidate set from billions of items under tight latency. Existing solutions ei- ther (i) suffer from limited expressiveness in capturing fine-grained user-item interactions, as seen in decoupled dual-tower architectures that rely on separate encoders, or generative models that lack precise target-aware matching capabilities, or (ii) build structured indices (tree, graph, quantization) whose item-centric topologies struggle to incorporate dynamic user preferences and incur prohibitive construction and maintenance costs. We present GRank, a novel structured-index-free retrieval paradigm that seamlessly unifies target-aware learning with user-centric retrieval. Our key innovations include: (1) A target-aware Generator trained to perform personalized candidate generation via GPU-accelerated MIPS, eliminating semantic drift and maintenance costs of structured indexing; (2) A lightweight but powerful Ranker that performs fine-grained, candidate-specific inference on small subsets; (3) An end-to-end multi-task learning framework that ensures semantic consistency between generation and ranking objectives. Extensive experiments on two public benchmarks and a billion-item production corpus demonstrate that GRank improves Recall@500 by over 30% and 1.7$\times$ the P99 QPS of state-of-the-art tree- and graph-based retrievers. GRank has been fully deployed in production in our recommendation platform since Q2 2025, serving 400 million active users with 99.95% service availability. Online A/B tests confirm significant improvements in core engagement metrics, with Total App Usage Time increasing by 0.160% in the main app and 0.165% in the Lite version.
Related papers
- LASER: An Efficient Target-Aware Segmented Attention Framework for End-to-End Long Sequence Modeling [20.507605423606282]
We present LASER, a full-stack optimization framework developed and deployed at Xiaohongshu (RedNote)<n>System efficiency: We introduce SeqVault, a unified schema-aware serving infrastructure for long user histories.<n>Algorithmic efficiency: We propose a Segmented Target Attention (STA) mechanism to address the computational overhead.
arXiv Detail & Related papers (2026-02-12T04:33:37Z) - RankGR: Rank-Enhanced Generative Retrieval with Listwise Direct Preference Optimization in Recommendation [36.297513746770456]
We propose RankGR, a Generative Retrieval method that incorporates listwise direct preference optimization for recommendation.<n>In IAP, we incorporate a novel listwise direct preference optimization strategy into GR, thus facilitating a more comprehensive understanding of the hierarchical user preferences.<n>We implement several practical improvements in training and deployment, ultimately achieving a real-time system capable of handling nearly ten thousand requests per second.
arXiv Detail & Related papers (2026-02-09T12:13:43Z) - Bridging Academia and Industry: A Comprehensive Benchmark for Attributed Graph Clustering [19.247242477915382]
Attributed Graph Clustering (AGC) is a fundamental unsupervised task that integrates structural topology and node attributes to uncover latent patterns in graph-structured data.<n>Despite its significance in industrial applications such as fraud detection and user segmentation, a significant chasm persists between academic research and real-world deployment.<n>We present PyAGC, a production-ready benchmark and library designed to stress-test AGC methods across diverse scales and structural properties.
arXiv Detail & Related papers (2026-02-09T11:07:24Z) - LACONIC: Dense-Level Effectiveness for Scalable Sparse Retrieval via a Two-Phase Training Curriculum [73.82125917416067]
LACONIC is a family of learned sparse retrievers based on the Llama-3 architecture.<n>The 8B variant achieves a state-of-the-art 60.2 nDCG on the MTEB Retrieval benchmark, ranking 15th on the leaderboard.
arXiv Detail & Related papers (2026-01-04T22:42:20Z) - A Simple and Effective Framework for Symmetric Consistent Indexing in Large-Scale Dense Retrieval [11.72564658353791]
Dense retrieval has become the industry standard in large-scale information retrieval systems due to its high efficiency and competitive accuracy.<n>The widely adopted dual-tower encoding architecture introduces inherent challenges, primarily representational space misalignment and retrieval index inconsistency.<n>This paper proposes a simple and effective framework named SCI comprising two synergistic modules.<n>We provide theoretical guarantees for our approach, with its effectiveness validated by results across public datasets and real-world e-commerce datasets.
arXiv Detail & Related papers (2025-12-15T08:11:24Z) - OneSearch: A Preliminary Exploration of the Unified End-to-End Generative Framework for E-commerce Search [43.94443394870866]
OneSearch is the first industrial-deployed end-to-end generative framework for e-commerce search.<n>OneSearch reduces operational expenditure by 75.40% and improves Model FLOPs Utilization from 3.26% to 27.32%.<n>The system has been successfully deployed across multiple search scenarios in Kuaishou, serving millions of users.
arXiv Detail & Related papers (2025-09-03T11:50:04Z) - Leveraging Generative Models for Real-Time Query-Driven Text Summarization in Large-Scale Web Search [54.987957691350665]
Query-Driven Text Summarization (QDTS) aims to generate concise and informative summaries from textual documents based on a given query.<n>Traditional extractive summarization models, based primarily on ranking candidate summary segments, have been the dominant approach in industrial applications.<n>We propose a novel framework to pioneer the application of generative models to address real-time QDTS in industrial web search.
arXiv Detail & Related papers (2025-08-28T08:51:51Z) - Open-Source Agentic Hybrid RAG Framework for Scientific Literature Review [2.092154729589438]
We present an agentic approach that encapsulates the hybrid RAG pipeline within an autonomous agent.<n>Our pipeline ingests bibliometric open-access data from PubMed, arXiv, and Google Scholar APIs.<n>A Llama-3.3-70B agent selects GraphRAG (translating queries to Cypher for KG) or VectorRAG (combining sparse and dense retrieval with re-ranking)
arXiv Detail & Related papers (2025-07-30T18:54:15Z) - ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation [51.297873393639456]
ArtifactsBench is a framework for automated visual code generation evaluation.<n>Our framework renders each generated artifact and captures its dynamic behavior through temporal screenshots.<n>We construct a new benchmark of 1,825 diverse tasks and evaluate over 30 leading Large Language Models.
arXiv Detail & Related papers (2025-07-07T12:53:00Z) - Learning to Rank in Generative Retrieval [62.91492903161522]
Generative retrieval aims to generate identifier strings of relevant passages as the retrieval target.
We propose a learning-to-rank framework for generative retrieval, dubbed LTRGR.
This framework only requires an additional learning-to-rank training phase to enhance current generative retrieval systems.
arXiv Detail & Related papers (2023-06-27T05:48:14Z) - How Does Generative Retrieval Scale to Millions of Passages? [68.98628807288972]
We conduct the first empirical study of generative retrieval techniques across various corpus scales.
We scale generative retrieval to millions of passages with a corpus of 8.8M passages and evaluating model sizes up to 11B parameters.
While generative retrieval is competitive with state-of-the-art dual encoders on small corpora, scaling to millions of passages remains an important and unsolved challenge.
arXiv Detail & Related papers (2023-05-19T17:33:38Z) - AutoBERT-Zero: Evolving BERT Backbone from Scratch [94.89102524181986]
We propose an Operation-Priority Neural Architecture Search (OP-NAS) algorithm to automatically search for promising hybrid backbone architectures.
We optimize both the search algorithm and evaluation of candidate models to boost the efficiency of our proposed OP-NAS.
Experiments show that the searched architecture (named AutoBERT-Zero) significantly outperforms BERT and its variants of different model capacities in various downstream tasks.
arXiv Detail & Related papers (2021-07-15T16:46:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.