Related papers: GRank: Towards Target-Aware and Streamlined Industrial Retrieval with a Generate-Rank Framework

GRank: Towards Target-Aware and Streamlined Industrial Retrieval with a Generate-Rank Framework

URL: http://arxiv.org/abs/2510.15299v1
Date: Fri, 17 Oct 2025 04:15:09 GMT
Title: GRank: Towards Target-Aware and Streamlined Industrial Retrieval with a Generate-Rank Framework
Authors: Yijia Sun, Shanshan Huang, Zhiyuan Guan, Qiang Luo, Ruiming Tang, Kun Gai, Guorui Zhou,
Abstract summary: Industrial-scale recommender systems rely on a cascade pipeline in which the retrieval stage must return a high-recall candidate set from billions of items under tight latency.<n>We present GRank, a novel structured-index-free retrieval paradigm that seamlessly unifies target-aware learning with user-centric retrieval.
Score: 47.25361445845229
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Industrial-scale recommender systems rely on a cascade pipeline in which the retrieval stage must return a high-recall candidate set from billions of items under tight latency. Existing solutions ei- ther (i) suffer from limited expressiveness in capturing fine-grained user-item interactions, as seen in decoupled dual-tower architectures that rely on separate encoders, or generative models that lack precise target-aware matching capabilities, or (ii) build structured indices (tree, graph, quantization) whose item-centric topologies struggle to incorporate dynamic user preferences and incur prohibitive construction and maintenance costs. We present GRank, a novel structured-index-free retrieval paradigm that seamlessly unifies target-aware learning with user-centric retrieval. Our key innovations include: (1) A target-aware Generator trained to perform personalized candidate generation via GPU-accelerated MIPS, eliminating semantic drift and maintenance costs of structured indexing; (2) A lightweight but powerful Ranker that performs fine-grained, candidate-specific inference on small subsets; (3) An end-to-end multi-task learning framework that ensures semantic consistency between generation and ranking objectives. Extensive experiments on two public benchmarks and a billion-item production corpus demonstrate that GRank improves Recall@500 by over 30% and 1.7$\times$ the P99 QPS of state-of-the-art tree- and graph-based retrievers. GRank has been fully deployed in production in our recommendation platform since Q2 2025, serving 400 million active users with 99.95% service availability. Online A/B tests confirm significant improvements in core engagement metrics, with Total App Usage Time increasing by 0.160% in the main app and 0.165% in the Lite version.

Related papers

LASER: An Efficient Target-Aware Segmented Attention Framework for End-to-End Long Sequence Modeling [20.507605423606282]
We present LASER, a full-stack optimization framework developed and deployed at Xiaohongshu (RedNote)<n>System efficiency: We introduce SeqVault, a unified schema-aware serving infrastructure for long user histories.<n>Algorithmic efficiency: We propose a Segmented Target Attention (STA) mechanism to address the computational overhead.
arXiv Detail & Related papers (2026-02-12T04:33:37Z)
RankGR: Rank-Enhanced Generative Retrieval with Listwise Direct Preference Optimization in Recommendation [36.297513746770456]
We propose RankGR, a Generative Retrieval method that incorporates listwise direct preference optimization for recommendation.<n>In IAP, we incorporate a novel listwise direct preference optimization strategy into GR, thus facilitating a more comprehensive understanding of the hierarchical user preferences.<n>We implement several practical improvements in training and deployment, ultimately achieving a real-time system capable of handling nearly ten thousand requests per second.
arXiv Detail & Related papers (2026-02-09T12:13:43Z)
Bridging Academia and Industry: A Comprehensive Benchmark for Attributed Graph Clustering [19.247242477915382]
Attributed Graph Clustering (AGC) is a fundamental unsupervised task that integrates structural topology and node attributes to uncover latent patterns in graph-structured data.<n>Despite its significance in industrial applications such as fraud detection and user segmentation, a significant chasm persists between academic research and real-world deployment.<n>We present PyAGC, a production-ready benchmark and library designed to stress-test AGC methods across diverse scales and structural properties.
arXiv Detail & Related papers (2026-02-09T11:07:24Z)
LACONIC: Dense-Level Effectiveness for Scalable Sparse Retrieval via a Two-Phase Training Curriculum [73.82125917416067]
LACONIC is a family of learned sparse retrievers based on the Llama-3 architecture.<n>The 8B variant achieves a state-of-the-art 60.2 nDCG on the MTEB Retrieval benchmark, ranking 15th on the leaderboard.
arXiv Detail & Related papers (2026-01-04T22:42:20Z)
A Simple and Effective Framework for Symmetric Consistent Indexing in Large-Scale Dense Retrieval [11.72564658353791]
Dense retrieval has become the industry standard in large-scale information retrieval systems due to its high efficiency and competitive accuracy.<n>The widely adopted dual-tower encoding architecture introduces inherent challenges, primarily representational space misalignment and retrieval index inconsistency.<n>This paper proposes a simple and effective framework named SCI comprising two synergistic modules.<n>We provide theoretical guarantees for our approach, with its effectiveness validated by results across public datasets and real-world e-commerce datasets.
arXiv Detail & Related papers (2025-12-15T08:11:24Z)
OneSearch: A Preliminary Exploration of the Unified End-to-End Generative Framework for E-commerce Search [43.94443394870866]
OneSearch is the first industrial-deployed end-to-end generative framework for e-commerce search.<n>OneSearch reduces operational expenditure by 75.40% and improves Model FLOPs Utilization from 3.26% to 27.32%.<n>The system has been successfully deployed across multiple search scenarios in Kuaishou, serving millions of users.
arXiv Detail & Related papers (2025-09-03T11:50:04Z)
Leveraging Generative Models for Real-Time Query-Driven Text Summarization in Large-Scale Web Search [54.987957691350665]
Query-Driven Text Summarization (QDTS) aims to generate concise and informative summaries from textual documents based on a given query.<n>Traditional extractive summarization models, based primarily on ranking candidate summary segments, have been the dominant approach in industrial applications.<n>We propose a novel framework to pioneer the application of generative models to address real-time QDTS in industrial web search.
arXiv Detail & Related papers (2025-08-28T08:51:51Z)
Open-Source Agentic Hybrid RAG Framework for Scientific Literature Review [2.092154729589438]
We present an agentic approach that encapsulates the hybrid RAG pipeline within an autonomous agent.<n>Our pipeline ingests bibliometric open-access data from PubMed, arXiv, and Google Scholar APIs.<n>A Llama-3.3-70B agent selects GraphRAG (translating queries to Cypher for KG) or VectorRAG (combining sparse and dense retrieval with re-ranking)
arXiv Detail & Related papers (2025-07-30T18:54:15Z)
ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation [51.297873393639456]
ArtifactsBench is a framework for automated visual code generation evaluation.<n>Our framework renders each generated artifact and captures its dynamic behavior through temporal screenshots.<n>We construct a new benchmark of 1,825 diverse tasks and evaluate over 30 leading Large Language Models.
arXiv Detail & Related papers (2025-07-07T12:53:00Z)
Learning to Rank in Generative Retrieval [62.91492903161522]
Generative retrieval aims to generate identifier strings of relevant passages as the retrieval target. We propose a learning-to-rank framework for generative retrieval, dubbed LTRGR. This framework only requires an additional learning-to-rank training phase to enhance current generative retrieval systems.
arXiv Detail & Related papers (2023-06-27T05:48:14Z)
How Does Generative Retrieval Scale to Millions of Passages? [68.98628807288972]
We conduct the first empirical study of generative retrieval techniques across various corpus scales. We scale generative retrieval to millions of passages with a corpus of 8.8M passages and evaluating model sizes up to 11B parameters. While generative retrieval is competitive with state-of-the-art dual encoders on small corpora, scaling to millions of passages remains an important and unsolved challenge.
arXiv Detail & Related papers (2023-05-19T17:33:38Z)
AutoBERT-Zero: Evolving BERT Backbone from Scratch [94.89102524181986]
We propose an Operation-Priority Neural Architecture Search (OP-NAS) algorithm to automatically search for promising hybrid backbone architectures. We optimize both the search algorithm and evaluation of candidate models to boost the efficiency of our proposed OP-NAS. Experiments show that the searched architecture (named AutoBERT-Zero) significantly outperforms BERT and its variants of different model capacities in various downstream tasks.
arXiv Detail & Related papers (2021-07-15T16:46:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.