IRLI: Iterative Re-partitioning for Learning to Index
- URL: http://arxiv.org/abs/2103.09944v1
- Date: Wed, 17 Mar 2021 23:13:25 GMT
- Title: IRLI: Iterative Re-partitioning for Learning to Index
- Authors: Gaurav Gupta, Tharun Medini, Anshumali Shrivastava, Alexander J Smola
- Abstract summary: Methods have to trade between obtaining high accuracy while maintaining load balance and scalability in distributed settings.
We propose a novel approach called IRLI, which iteratively partitions the items by learning the relevant buckets directly from the query-item relevance data.
We mathematically show that IRLI retrieves the correct item with high probability under very natural assumptions and provides superior load balancing.
- Score: 104.72641345738425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural models have transformed the fundamental information retrieval problem
of mapping a query to a giant set of items. However, the need for efficient and
low latency inference forces the community to reconsider efficient approximate
near-neighbor search in the item space. To this end, learning to index is
gaining much interest in recent times. Methods have to trade between obtaining
high accuracy while maintaining load balance and scalability in distributed
settings. We propose a novel approach called IRLI (pronounced `early'), which
iteratively partitions the items by learning the relevant buckets directly from
the query-item relevance data. Furthermore, IRLI employs a superior
power-of-$k$-choices based load balancing strategy. We mathematically show that
IRLI retrieves the correct item with high probability under very natural
assumptions and provides superior load balancing. IRLI surpasses the best
baseline's precision on multi-label classification while being $5x$ faster on
inference. For near-neighbor search tasks, the same method outperforms the
state-of-the-art Learned Hashing approach NeuralLSH by requiring only ~
{1/6}^th of the candidates for the same recall. IRLI is both data and model
parallel, making it ideal for distributed GPU implementation. We demonstrate
this advantage by indexing 100 million dense vectors and surpassing the popular
FAISS library by >10% on recall.
Related papers
- Towards Competitive Search Relevance For Inference-Free Learned Sparse Retrievers [6.773411876899064]
inference-free sparse models lag far behind in terms of search relevance when compared to both sparse and dense siamese models.
We propose two different approaches for performance improvement. First, we introduce the IDF-aware FLOPS loss, which introduces Inverted Document Frequency (IDF) to the sparsification of representations.
We find that it mitigates the negative impact of the FLOPS regularization on search relevance, allowing the model to achieve a better balance between accuracy and efficiency.
arXiv Detail & Related papers (2024-11-07T03:46:43Z) - LoRANN: Low-Rank Matrix Factorization for Approximate Nearest Neighbor Search [4.194768796374315]
We propose a new supervised score computation method based on the observation that inner product approximation is a multi-output regression problem.
Our experiments show that the proposed reduced-rank regression (RRR) method is superior to PQ in both query latency and memory usage.
We also introduce LoRANN, a clustering-based ANN library that leverages the proposed score computation method.
arXiv Detail & Related papers (2024-10-24T17:13:39Z) - RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval [24.472784635757016]
RetrievalAttention is a training-free approach to both accelerate attention computation and reduce GPU memory consumption.
Our evaluation shows that RetrievalAttention only needs to access 1--3% of data while maintaining high model accuracy.
arXiv Detail & Related papers (2024-09-16T17:59:52Z) - Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control [66.78146440275093]
Learned retrieval (LSR) is a family of neural methods that encode queries and documents into sparse lexical vectors.
We explore the application of LSR to the multi-modal domain, with a focus on text-image retrieval.
Current approaches like LexLIP and STAIR require complex multi-step training on massive datasets.
Our proposed approach efficiently transforms dense vectors from a frozen dense model into sparse lexical vectors.
arXiv Detail & Related papers (2024-02-27T14:21:56Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot
Neural Sparse Retrieval [92.27387459751309]
We provide SPRINT, a unified Python toolkit for evaluating neural sparse retrieval.
We establish strong and reproducible zero-shot sparse retrieval baselines across the well-acknowledged benchmark, BEIR.
We show that SPLADEv2 produces sparse representations with a majority of tokens outside of the original query and document.
arXiv Detail & Related papers (2023-07-19T22:48:02Z) - Injecting Domain Adaptation with Learning-to-hash for Effective and
Efficient Zero-shot Dense Retrieval [49.98615945702959]
We evaluate LTH and vector compression techniques for improving the downstream zero-shot retrieval accuracy of the TAS-B dense retriever.
Our results demonstrate that, unlike prior work, LTH strategies when applied naively can underperform the zero-shot TAS-B dense retriever on average by up to 14% nDCG@10.
arXiv Detail & Related papers (2022-05-23T17:53:44Z) - Approximate Nearest Neighbor Search under Neural Similarity Metric for
Large-Scale Recommendation [20.42993976179691]
We propose a novel method to extend ANN search to arbitrary matching functions.
Our main idea is to perform a greedy walk with a matching function in a similarity graph constructed from all items.
The proposed method has been fully deployed in the Taobao display advertising platform and brings a considerable advertising revenue increase.
arXiv Detail & Related papers (2022-02-14T07:55:57Z) - Resource Allocation in Multi-armed Bandit Exploration: Overcoming
Sublinear Scaling with Adaptive Parallelism [107.48538091418412]
We study exploration in multi-armed bandits when we have access to a divisible resource that can be allocated in varying amounts to arm pulls.
We focus in particular on the allocation of distributed computing resources, where we may obtain results faster by allocating more resources per pull.
arXiv Detail & Related papers (2020-10-31T18:19:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.