Related papers: Real-time Indexing for Large-scale Recommendation by Streaming Vector Quantization Retriever

Real-time Indexing for Large-scale Recommendation by Streaming Vector Quantization Retriever

URL: http://arxiv.org/abs/2501.08695v1
Date: Wed, 15 Jan 2025 10:09:15 GMT
Title: Real-time Indexing for Large-scale Recommendation by Streaming Vector Quantization Retriever
Authors: Xingyan Bin, Jianfei Cui, Wujie Yan, Zhichen Zhao, Xintian Han, Chongyang Yan, Feng Zhang, Xun Zhou, Qi Wu, Zuotao Liu,
Abstract summary: Streaming Vector Quantization model is a new generation of retrieval paradigm.<n> Streaming VQ attaches items with indexes in real time, granting it immediacy.<n>As a lightweight and implementation-friendly architecture, streaming VQ has been deployed and replaced all major retrievers in Douyin and Douyin Lite.
Score: 17.156348053402766
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Retrievers, which form one of the most important recommendation stages, are responsible for efficiently selecting possible positive samples to the later stages under strict latency limitations. Because of this, large-scale systems always rely on approximate calculations and indexes to roughly shrink candidate scale, with a simple ranking model. Considering simple models lack the ability to produce precise predictions, most of the existing methods mainly focus on incorporating complicated ranking models. However, another fundamental problem of index effectiveness remains unresolved, which also bottlenecks complication. In this paper, we propose a novel index structure: streaming Vector Quantization model, as a new generation of retrieval paradigm. Streaming VQ attaches items with indexes in real time, granting it immediacy. Moreover, through meticulous verification of possible variants, it achieves additional benefits like index balancing and reparability, enabling it to support complicated ranking models as existing approaches. As a lightweight and implementation-friendly architecture, streaming VQ has been deployed and replaced all major retrievers in Douyin and Douyin Lite, resulting in remarkable user engagement gain.

Related papers

MIBoost: A Gradient Boosting Algorithm for Variable Selection After Multiple Imputation [0.0]
In practice, analyses are often complicated by missing data.<n>We propose MIBoost, a novel algorithm that employs a uniform variable-selection mechanism across imputed datasets.
arXiv Detail & Related papers (2025-07-29T13:42:38Z)
FrugalRAG: Learning to retrieve and reason for multi-hop QA [10.193015391271535]
Large-scale fine-tuning is not needed to improve RAG metrics.<n>Supervised and RL-based fine-tuning can help RAG from the perspective of frugality.
arXiv Detail & Related papers (2025-07-10T11:02:13Z)
SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models [21.933379266533098]
Large Language Models (LLMs) present a critical trade-off between inference quality and computational cost.<n>Existing serving strategies often employ fixed model scales or static two-stage speculative decoding.<n>This paper introduces systemname, a novel framework that reimagines LLM inference as an adaptive routing problem.
arXiv Detail & Related papers (2025-05-12T15:46:28Z)
Fast or Better? Balancing Accuracy and Cost in Retrieval-Augmented Generation with Flexible User Control [52.405085773954596]
Retrieval-Augmented Generation (RAG) has emerged as a powerful approach to mitigate large language model hallucinations. Existing RAG frameworks often apply retrieval indiscriminately,leading to inefficiencies-over-retrieving. We introduce a novel user-controllable RAG framework that enables dynamic adjustment of the accuracy-cost trade-off.
arXiv Detail & Related papers (2025-02-17T18:56:20Z)
Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation [15.31883349259767]
Rankify is an open-source toolkit designed to unify retrieval, re-ranking, and RAG within a cohesive framework. It supports a wide range of retrieval techniques, including dense and sparse retrievers, while incorporating state-of-the-art re-ranking models. Rankify includes a collection of pre-retrieved datasets to facilitate benchmarking, available at Huggingface.
arXiv Detail & Related papers (2025-02-04T16:33:25Z)
COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement [80.18490952057125]
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks. We propose Context-Wise Order-Agnostic Language Modeling (COrAL) to overcome these challenges. Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally.
arXiv Detail & Related papers (2024-10-12T23:56:19Z)
UpLIF: An Updatable Self-Tuning Learned Index Framework [4.077820670802213]
UpLIF is an adaptive self-tuning learned index that adjusts the model to accommodate incoming updates. We also introduce the concept of balanced model adjustment, which determines the model's inherent properties.
arXiv Detail & Related papers (2024-08-07T22:30:43Z)
LiNR: Model Based Neural Retrieval on GPUs at LinkedIn [7.7977551402289045]
LiNR is LinkedIn's large-scale, GPU-based retrieval system. We describe scaling our system for large indexes, incorporating full scans and efficient filtering. We believe LiNR represents one of the industry's first Live-updated model-based retrieval indexes.
arXiv Detail & Related papers (2024-07-18T07:04:33Z)
Faster Learned Sparse Retrieval with Block-Max Pruning [11.080810272211906]
This paper introduces Block-Max Pruning (BMP), an innovative dynamic pruning strategy tailored for indexes arising in learned sparse retrieval environments. BMP substantially outperforms existing dynamic pruning strategies, offering unparalleled efficiency in safe retrieval contexts.
arXiv Detail & Related papers (2024-05-02T09:26:30Z)
Improving Text Matching in E-Commerce Search with A Rationalizable, Intervenable and Fast Entity-Based Relevance Model [78.80174696043021]
We propose a novel model called the Entity-Based Relevance Model (EBRM) The decomposition allows us to use a Cross-encoder QE relevance module for high accuracy. We also show that pretraining the QE module with auto-generated QE data from user logs can further improve the overall performance.
arXiv Detail & Related papers (2023-07-01T15:44:53Z)
DSI++: Updating Transformer Memory with New Documents [95.70264288158766]
We introduce DSI++, a continual learning challenge for DSI to incrementally index new documents. We show that continual indexing of new documents leads to considerable forgetting of previously indexed documents. We introduce a generative memory to sample pseudo-queries for documents and supplement them during continual indexing to prevent forgetting for the retrieval task.
arXiv Detail & Related papers (2022-12-19T18:59:34Z)
CorpusBrain: Pre-train a Generative Retrieval Model for Knowledge-Intensive Language Tasks [62.22920673080208]
Single-step generative model can dramatically simplify the search process and be optimized in end-to-end manner. We name the pre-trained generative retrieval model as CorpusBrain as all information about the corpus is encoded in its parameters without the need of constructing additional index.
arXiv Detail & Related papers (2022-08-16T10:22:49Z)
Autoregressive Search Engines: Generating Substrings as Document Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers. Previous work has explored ways to partition the search space into hierarchical structures. In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.