Related papers: Adaptive Retrieval and Scalable Indexing for k-NN Search with Cross-Encoders

Adaptive Retrieval and Scalable Indexing for k-NN Search with Cross-Encoders

URL: http://arxiv.org/abs/2405.03651v1
Date: Mon, 6 May 2024 17:14:34 GMT
Title: Adaptive Retrieval and Scalable Indexing for k-NN Search with Cross-Encoders
Authors: Nishant Yadav, Nicholas Monath, Manzil Zaheer, Rob Fergus, Andrew McCallum,
Abstract summary: Cross-encoder (CE) models which compute similarity by jointly encoding a query-item pair perform better than embedding-based models (dual-encoders) at estimating query-item relevance. We propose a sparse-matrix factorization based method that efficiently computes latent query and item embeddings to approximate CE scores and performs k-NN search with the approximate CE similarity.
Score: 77.84801537608651
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cross-encoder (CE) models which compute similarity by jointly encoding a query-item pair perform better than embedding-based models (dual-encoders) at estimating query-item relevance. Existing approaches perform k-NN search with CE by approximating the CE similarity with a vector embedding space fit either with dual-encoders (DE) or CUR matrix factorization. DE-based retrieve-and-rerank approaches suffer from poor recall on new domains and the retrieval with DE is decoupled from the CE. While CUR-based approaches can be more accurate than the DE-based approach, they require a prohibitively large number of CE calls to compute item embeddings, thus making it impractical for deployment at scale. In this paper, we address these shortcomings with our proposed sparse-matrix factorization based method that efficiently computes latent query and item embeddings to approximate CE scores and performs k-NN search with the approximate CE similarity. We compute item embeddings offline by factorizing a sparse matrix containing query-item CE scores for a set of train queries. Our method produces a high-quality approximation while requiring only a fraction of CE calls as compared to CUR-based methods, and allows for leveraging DE to initialize the embedding space while avoiding compute- and resource-intensive finetuning of DE via distillation. At test time, the item embeddings remain fixed and retrieval occurs over rounds, alternating between a) estimating the test query embedding by minimizing error in approximating CE scores of items retrieved thus far, and b) using the updated test query embedding for retrieving more items. Our k-NN search method improves recall by up to 5% (k=1) and 54% (k=100) over DE-based approaches. Additionally, our indexing approach achieves a speedup of up to 100x over CUR-based and 5x over DE distillation methods, while matching or improving k-NN search recall over baselines.

Related papers

Unifying Generative and Dense Retrieval for Sequential Recommendation [37.402860622707244]
We propose LIGER, a hybrid model that combines the strengths of sequential dense retrieval and generative retrieval. LIGER integrates sequential dense retrieval into generative retrieval, mitigating performance differences and enhancing cold-start item recommendation. This hybrid approach provides insights into the trade-offs between these approaches and demonstrates improvements in efficiency and effectiveness for recommendation systems in small-scale benchmarks.
arXiv Detail & Related papers (2024-11-27T23:36:59Z)
pEBR: A Probabilistic Approach to Embedding Based Retrieval [4.8338111302871525]
Embedding retrieval aims to learn a shared semantic representation space for both queries and items. In current industrial practice, retrieval systems typically retrieve a fixed number of items for different queries.
arXiv Detail & Related papers (2024-10-25T07:14:12Z)
Efficient Inference of Sub-Item Id-based Sequential Recommendation Models with Millions of Items [63.117573355917465]
We show that it is possible to improve RecJPQ-based models' inference efficiency using the PQTopK algorithm. We speed up RecJPQ-enhanced SASRec by a factor of 4.5 x compared to the original SASRec's inference method and by a factor of 1.56 x compared to the method implemented in RecJPQ code.
arXiv Detail & Related papers (2024-08-19T13:43:48Z)
Retrieval with Learned Similarities [2.729516456192901]
State-of-the-art retrieval algorithms have migrated to learned similarities. We show that Mixture-of-Logits (MoL) can be realized empirically to achieve superior performance on diverse retrieval scenarios.
arXiv Detail & Related papers (2024-07-22T08:19:34Z)
Faster Learned Sparse Retrieval with Block-Max Pruning [11.080810272211906]
This paper introduces Block-Max Pruning (BMP), an innovative dynamic pruning strategy tailored for indexes arising in learned sparse retrieval environments. BMP substantially outperforms existing dynamic pruning strategies, offering unparalleled efficiency in safe retrieval contexts.
arXiv Detail & Related papers (2024-05-02T09:26:30Z)
A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data. We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z)
Efficient k-NN Search with Cross-Encoders using Adaptive Multi-Round CUR Decomposition [77.4863142882136]
Cross-encoder models are prohibitively expensive for direct k-nearest neighbor (k-NN) search. We propose ADACUR, a method that adaptively, iteratively, and efficiently minimizes the approximation error for the practically important top-k neighbors.
arXiv Detail & Related papers (2023-05-04T17:01:17Z)
Efficient Nearest Neighbor Search for Cross-Encoder Models using Matrix Factorization [60.91600465922932]
We present an approach that avoids the use of a dual-encoder for retrieval, relying solely on the cross-encoder. Our approach provides test-time recall-vs-computational cost trade-offs superior to the current widely-used methods.
arXiv Detail & Related papers (2022-10-23T00:32:04Z)
IRLI: Iterative Re-partitioning for Learning to Index [104.72641345738425]
Methods have to trade between obtaining high accuracy while maintaining load balance and scalability in distributed settings. We propose a novel approach called IRLI, which iteratively partitions the items by learning the relevant buckets directly from the query-item relevance data. We mathematically show that IRLI retrieves the correct item with high probability under very natural assumptions and provides superior load balancing.
arXiv Detail & Related papers (2021-03-17T23:13:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.