Efficient Nearest Neighbor Search for Cross-Encoder Models using Matrix
Factorization
- URL: http://arxiv.org/abs/2210.12579v1
- Date: Sun, 23 Oct 2022 00:32:04 GMT
- Title: Efficient Nearest Neighbor Search for Cross-Encoder Models using Matrix
Factorization
- Authors: Nishant Yadav, Nicholas Monath, Rico Angell, Manzil Zaheer and Andrew
McCallum
- Abstract summary: We present an approach that avoids the use of a dual-encoder for retrieval, relying solely on the cross-encoder.
Our approach provides test-time recall-vs-computational cost trade-offs superior to the current widely-used methods.
- Score: 60.91600465922932
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Efficient k-nearest neighbor search is a fundamental task, foundational for
many problems in NLP. When the similarity is measured by dot-product between
dual-encoder vectors or $\ell_2$-distance, there already exist many scalable
and efficient search methods. But not so when similarity is measured by more
accurate and expensive black-box neural similarity models, such as
cross-encoders, which jointly encode the query and candidate neighbor. The
cross-encoders' high computational cost typically limits their use to reranking
candidates retrieved by a cheaper model, such as dual encoder or TF-IDF.
However, the accuracy of such a two-stage approach is upper-bounded by the
recall of the initial candidate set, and potentially requires additional
training to align the auxiliary retrieval model with the cross-encoder model.
In this paper, we present an approach that avoids the use of a dual-encoder for
retrieval, relying solely on the cross-encoder. Retrieval is made efficient
with CUR decomposition, a matrix decomposition approach that approximates all
pairwise cross-encoder distances from a small subset of rows and columns of the
distance matrix. Indexing items using our approach is computationally cheaper
than training an auxiliary dual-encoder model through distillation.
Empirically, for k > 10, our approach provides test-time
recall-vs-computational cost trade-offs superior to the current widely-used
methods that re-rank items retrieved using a dual-encoder or TF-IDF.
Related papers
- MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings [15.275864151890511]
We introduce MUVERA (MUlti-VEctor Retrieval Algorithm), a retrieval mechanism which reduces multi-vector search to single-vector similarity search.
MUVERA achieves consistently good end-to-end recall and latency across a diverse set of the BEIR retrieval datasets.
arXiv Detail & Related papers (2024-05-29T20:40:20Z) - Adaptive Retrieval and Scalable Indexing for k-NN Search with Cross-Encoders [77.84801537608651]
Cross-encoder (CE) models which compute similarity by jointly encoding a query-item pair perform better than embedding-based models (dual-encoders) at estimating query-item relevance.
We propose a sparse-matrix factorization based method that efficiently computes latent query and item embeddings to approximate CE scores and performs k-NN search with the approximate CE similarity.
arXiv Detail & Related papers (2024-05-06T17:14:34Z) - Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction [57.16121098944589]
RDA is a pioneering approach designed to address two primary deficiencies prevalent in previous endeavors aiming at stealing pre-trained encoders.
It is accomplished via a sample-wise prototype, which consolidates the target encoder's representations for a given sample's various perspectives.
For more potent efficacy, we develop a multi-relational extraction loss that trains the surrogate encoder to Discriminate mismatched embedding-prototype pairs.
arXiv Detail & Related papers (2023-12-01T15:03:29Z) - Improving Code Search with Hard Negative Sampling Based on Fine-tuning [15.341959871682981]
We introduce a cross-encoder architecture for code search that jointly encodes the concatenation of query and code.
We also introduce a Retriever-Ranker (RR) framework that cascades the dual-encoder and cross-encoder to promote the efficiency of evaluation and online serving.
arXiv Detail & Related papers (2023-05-08T07:04:28Z) - Efficient k-NN Search with Cross-Encoders using Adaptive Multi-Round CUR
Decomposition [77.4863142882136]
Cross-encoder models are prohibitively expensive for direct k-nearest neighbor (k-NN) search.
We propose ADACUR, a method that adaptively, iteratively, and efficiently minimizes the approximation error for the practically important top-k neighbors.
arXiv Detail & Related papers (2023-05-04T17:01:17Z) - Revisiting Code Search in a Two-Stage Paradigm [67.02322603435628]
TOSS is a two-stage fusion code search framework.
It first uses IR-based and bi-encoder models to efficiently recall a small number of top-k code candidates.
It then uses fine-grained cross-encoders for finer ranking.
arXiv Detail & Related papers (2022-08-24T02:34:27Z) - Nearest neighbor search with compact codes: A decoder perspective [77.60612610421101]
We re-interpret popular methods such as binary hashing or product quantizers as auto-encoders.
We design backward-compatible decoders that improve the reconstruction of the vectors from the same codes.
arXiv Detail & Related papers (2021-12-17T15:22:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.