Efficient Data-aware Distance Comparison Operations for High-Dimensional Approximate Nearest Neighbor Search
- URL: http://arxiv.org/abs/2411.17229v2
- Date: Sun, 01 Dec 2024 13:20:02 GMT
- Title: Efficient Data-aware Distance Comparison Operations for High-Dimensional Approximate Nearest Neighbor Search
- Authors: Liwei Deng, Penghao Chen, Ximu Zeng, Tianfu Wang, Yan Zhao, Kai Zheng,
- Abstract summary: High-dimensional approximate $K$ nearest neighbor search (AKNN) is a fundamental task for various applications, including information retrieval.
Most existing algorithms for AKNN can be decomposed into two main components, i.e., candidate generation and distance comparison operations (DCOs)
We propose an Data-Aware Distance Estimation approach, called DADE, which approximates the exact distance in a lower-dimensional space.
- Score: 14.77572360618428
- License:
- Abstract: High-dimensional approximate $K$ nearest neighbor search (AKNN) is a fundamental task for various applications, including information retrieval. Most existing algorithms for AKNN can be decomposed into two main components, i.e., candidate generation and distance comparison operations (DCOs). While different methods have unique ways of generating candidates, they all share the same DCO process. In this study, we focus on accelerating the process of DCOs that dominates the time cost in most existing AKNN algorithms. To achieve this, we propose an Data-Aware Distance Estimation approach, called DADE, which approximates the exact distance in a lower-dimensional space. We theoretically prove that the distance estimation in DADE is unbiased in terms of data distribution. Furthermore, we propose an optimized estimation based on the unbiased distance estimation formulation. In addition, we propose a hypothesis testing approach to adaptively determine the number of dimensions needed to estimate the exact distance with sufficient confidence. We integrate DADE into widely-used AKNN search algorithms, e.g., IVF and HNSW, and conduct extensive experiments to demonstrate the superiority.
Related papers
- RoarGraph: A Projected Bipartite Graph for Efficient Cross-Modal Approximate Nearest Neighbor Search [11.069814476661827]
Cross-modal ANNS aims to use the data vector from one modality to retrieve the most similar items from another.
State-of-the-art ANNS approaches suffer poor performance for OOD workloads.
We propose pRojected bipartite Graph (RoarGraph), an efficient ANNS graph index built under the guidance of query distribution.
arXiv Detail & Related papers (2024-08-16T06:48:16Z) - Adaptive Retrieval and Scalable Indexing for k-NN Search with Cross-Encoders [77.84801537608651]
Cross-encoder (CE) models which compute similarity by jointly encoding a query-item pair perform better than embedding-based models (dual-encoders) at estimating query-item relevance.
We propose a sparse-matrix factorization based method that efficiently computes latent query and item embeddings to approximate CE scores and performs k-NN search with the approximate CE similarity.
arXiv Detail & Related papers (2024-05-06T17:14:34Z) - Linearized Wasserstein dimensionality reduction with approximation
guarantees [65.16758672591365]
LOT Wassmap is a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space.
We show that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size.
We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
arXiv Detail & Related papers (2023-02-14T22:12:16Z) - ZARTS: On Zero-order Optimization for Neural Architecture Search [94.41017048659664]
Differentiable architecture search (DARTS) has been a popular one-shot paradigm for NAS due to its high efficiency.
This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation.
In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue.
arXiv Detail & Related papers (2021-10-10T09:35:15Z) - DEANN: Speeding up Kernel-Density Estimation using Approximate Nearest
Neighbor Search [8.25574589820305]
We present an algorithm called Density Estimation from Approximate Nearest Neighbors (DEANN)
We apply ANN algorithms as a black box subroutine to compute an unbiased Density Estimation (KDE)
We show that our implementation outperforms state of the art implementations in all high dimensional datasets we considered.
arXiv Detail & Related papers (2021-07-06T17:11:28Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Nearest Neighbor Search Under Uncertainty [19.225091554227948]
Nearest Neighbor Search (NNS) is a central task in knowledge representation, learning, and reasoning.
This paper studies NNS under Uncertainty (NNSU)
arXiv Detail & Related papers (2021-03-08T20:20:01Z) - Manifold learning with approximate nearest neighbors [1.8477401359673706]
We use a broad range of approximate nearest neighbor algorithms within manifold learning algorithms and evaluate their impact on embedding accuracy.
Via a thorough empirical investigation based on the benchmark MNIST dataset, it is shown that approximate nearest neighbors lead to substantial improvements in computational time.
This application demonstrates how the proposed methods can be used to visualize and identify anomalies and uncover underlying structure within high-dimensional data.
arXiv Detail & Related papers (2021-02-22T12:04:23Z) - Diverse Knowledge Distillation for End-to-End Person Search [81.4926655119318]
Person search aims to localize and identify a specific person from a gallery of images.
Recent methods can be categorized into two groups, i.e., two-step and end-to-end approaches.
We propose a simple yet strong end-to-end network with diverse knowledge distillation to break the bottleneck.
arXiv Detail & Related papers (2020-12-21T09:04:27Z) - DecAug: Augmenting HOI Detection via Decomposition [54.65572599920679]
Current algorithms suffer from insufficient training samples and category imbalance within datasets.
We propose an efficient and effective data augmentation method called DecAug for HOI detection.
Experiments show that our method brings up to 3.3 mAP and 1.6 mAP improvements on V-COCO and HICODET dataset.
arXiv Detail & Related papers (2020-10-02T13:59:05Z) - Deep Retrieval: Learning A Retrievable Structure for Large-Scale
Recommendations [21.68175843347951]
We present Deep Retrieval (DR), to learn a retrievable structure directly with user-item interaction data.
DR is among the first non-ANN algorithms successfully deployed at the scale of hundreds of millions of items for industrial recommendation systems.
arXiv Detail & Related papers (2020-07-12T06:23:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.