IRLI: Iterative Re-partitioning for Learning to Index
- URL: http://arxiv.org/abs/2103.09944v1
- Date: Wed, 17 Mar 2021 23:13:25 GMT
- Title: IRLI: Iterative Re-partitioning for Learning to Index
- Authors: Gaurav Gupta, Tharun Medini, Anshumali Shrivastava, Alexander J Smola
- Abstract summary: Methods have to trade between obtaining high accuracy while maintaining load balance and scalability in distributed settings.
We propose a novel approach called IRLI, which iteratively partitions the items by learning the relevant buckets directly from the query-item relevance data.
We mathematically show that IRLI retrieves the correct item with high probability under very natural assumptions and provides superior load balancing.
- Score: 104.72641345738425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural models have transformed the fundamental information retrieval problem
of mapping a query to a giant set of items. However, the need for efficient and
low latency inference forces the community to reconsider efficient approximate
near-neighbor search in the item space. To this end, learning to index is
gaining much interest in recent times. Methods have to trade between obtaining
high accuracy while maintaining load balance and scalability in distributed
settings. We propose a novel approach called IRLI (pronounced `early'), which
iteratively partitions the items by learning the relevant buckets directly from
the query-item relevance data. Furthermore, IRLI employs a superior
power-of-$k$-choices based load balancing strategy. We mathematically show that
IRLI retrieves the correct item with high probability under very natural
assumptions and provides superior load balancing. IRLI surpasses the best
baseline's precision on multi-label classification while being $5x$ faster on
inference. For near-neighbor search tasks, the same method outperforms the
state-of-the-art Learned Hashing approach NeuralLSH by requiring only ~
{1/6}^th of the candidates for the same recall. IRLI is both data and model
parallel, making it ideal for distributed GPU implementation. We demonstrate
this advantage by indexing 100 million dense vectors and surpassing the popular
FAISS library by >10% on recall.
Related papers
- Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control [66.78146440275093]
Learned retrieval (LSR) is a family of neural methods that encode queries and documents into sparse lexical vectors.
We explore the application of LSR to the multi-modal domain, with a focus on text-image retrieval.
Current approaches like LexLIP and STAIR require complex multi-step training on massive datasets.
Our proposed approach efficiently transforms dense vectors from a frozen dense model into sparse lexical vectors.
arXiv Detail & Related papers (2024-02-27T14:21:56Z) - Efficient Prompt Caching via Embedding Similarity [26.456212783693545]
We focus on the prediction accuracy of prompt caching for single-round question-answering tasks via embedding similarity.
We propose a distillation-based method to fine-tune the existing embeddings for better better prediction.
We also conduct simulations demonstrating that our trained models achieve better caching efficiency than the previous embedding model.
arXiv Detail & Related papers (2024-02-02T06:34:11Z) - DeepLSH: Deep Locality-Sensitive Hash Learning for Fast and Efficient
Near-Duplicate Crash Report Detection [0.29998889086656577]
With real-time streaming bug collection, systems are needed to quickly answer the question: What are the most similar bugs to a new one?, that is, efficiently find near-duplicates.
LSH has not been considered in the crash bucketing literature.
We introduce DeepLSH, a Siamese architecture with an original loss function, that perfectly approximates the locality-sensitivity property.
arXiv Detail & Related papers (2023-10-10T15:26:27Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot
Neural Sparse Retrieval [92.27387459751309]
We provide SPRINT, a unified Python toolkit for evaluating neural sparse retrieval.
We establish strong and reproducible zero-shot sparse retrieval baselines across the well-acknowledged benchmark, BEIR.
We show that SPLADEv2 produces sparse representations with a majority of tokens outside of the original query and document.
arXiv Detail & Related papers (2023-07-19T22:48:02Z) - Injecting Domain Adaptation with Learning-to-hash for Effective and
Efficient Zero-shot Dense Retrieval [49.98615945702959]
We evaluate LTH and vector compression techniques for improving the downstream zero-shot retrieval accuracy of the TAS-B dense retriever.
Our results demonstrate that, unlike prior work, LTH strategies when applied naively can underperform the zero-shot TAS-B dense retriever on average by up to 14% nDCG@10.
arXiv Detail & Related papers (2022-05-23T17:53:44Z) - Approximate Nearest Neighbor Search under Neural Similarity Metric for
Large-Scale Recommendation [20.42993976179691]
We propose a novel method to extend ANN search to arbitrary matching functions.
Our main idea is to perform a greedy walk with a matching function in a similarity graph constructed from all items.
The proposed method has been fully deployed in the Taobao display advertising platform and brings a considerable advertising revenue increase.
arXiv Detail & Related papers (2022-02-14T07:55:57Z) - SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval [11.38022203865326]
SPLADE model provides highly sparse representations and competitive results with respect to state-of-the-art dense and sparse approaches.
We modify the pooling mechanism, benchmark a model solely based on document expansion, and introduce models trained with distillation.
Overall, SPLADE is considerably improved with more than $9$% gains on NDCG@10 on TREC DL 2019, leading to state-of-the-art results on the BEIR benchmark.
arXiv Detail & Related papers (2021-09-21T10:43:42Z) - Resource Allocation in Multi-armed Bandit Exploration: Overcoming
Sublinear Scaling with Adaptive Parallelism [107.48538091418412]
We study exploration in multi-armed bandits when we have access to a divisible resource that can be allocated in varying amounts to arm pulls.
We focus in particular on the allocation of distributed computing resources, where we may obtain results faster by allocating more resources per pull.
arXiv Detail & Related papers (2020-10-31T18:19:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.