ParlayANN: Scalable and Deterministic Parallel Graph-Based Approximate
Nearest Neighbor Search Algorithms
- URL: http://arxiv.org/abs/2305.04359v2
- Date: Thu, 8 Feb 2024 17:00:13 GMT
- Title: ParlayANN: Scalable and Deterministic Parallel Graph-Based Approximate
Nearest Neighbor Search Algorithms
- Authors: Magdalen Dobson Manohar, Zheqi Shen, Guy E. Blelloch, Laxman
Dhulipala, Yan Gu, Harsha Vardhan Simhadri, Yihan Sun
- Abstract summary: We introduce ParlayANN, a library of deterministic and parallel graph-based approximate nearest neighbor search algorithms.
We develop novel parallel implementations for four state-of-the-art graph-based ANNS algorithms that scale to billion-scale datasets.
- Score: 5.478671305092084
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Approximate nearest-neighbor search (ANNS) algorithms are a key part of the
modern deep learning stack due to enabling efficient similarity search over
high-dimensional vector space representations (i.e., embeddings) of data. Among
various ANNS algorithms, graph-based algorithms are known to achieve the best
throughput-recall tradeoffs. Despite the large scale of modern ANNS datasets,
existing parallel graph based implementations suffer from significant
challenges to scale to large datasets due to heavy use of locks and other
sequential bottlenecks, which 1) prevents them from efficiently scaling to a
large number of processors, and 2) results in nondeterminism that is
undesirable in certain applications.
In this paper, we introduce ParlayANN, a library of deterministic and
parallel graph-based approximate nearest neighbor search algorithms, along with
a set of useful tools for developing such algorithms. In this library, we
develop novel parallel implementations for four state-of-the-art graph-based
ANNS algorithms that scale to billion-scale datasets. Our algorithms are
deterministic and achieve high scalability across a diverse set of challenging
datasets. In addition to the new algorithmic ideas, we also conduct a detailed
experimental study of our new algorithms as well as two existing non-graph
approaches. Our experimental results both validate the effectiveness of our new
techniques, and lead to a comprehensive comparison among ANNS algorithms on
large scale datasets with a list of interesting findings.
Related papers
- Comparing Personalized Relevance Algorithms for Directed Graphs [0.34952465649465553]
We present an interactive Web platform that, given a directed graph, allows identifying the most relevant nodes related to a given query node.
We provide 50 pre-loaded datasets from Wikipedia, Twitter, and Amazon and seven algorithms.
Our tool helps to uncover hidden relationships within the data, which makes of it a valuable addition to the repertoire of graph analysis algorithms.
arXiv Detail & Related papers (2024-05-03T17:24:08Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - PECANN: Parallel Efficient Clustering with Graph-Based Approximate
Nearest Neighbor Search [8.15681999722805]
This paper studies density-based clustering of point sets.
It unifies the different variants of density peaks clustering into a single framework, PECANN.
We implement five clustering algorithms with PECANN and evaluate them on synthetic and real-world datasets with up to 1.28 million points and up to 1024 dimensions on a 30-core machine with two-way hyper-threading.
arXiv Detail & Related papers (2023-12-06T22:43:50Z) - Learning the hub graphical Lasso model with the structured sparsity via
an efficient algorithm [1.0923877073891446]
We introduce a two-phase algorithm to estimate hub graphical models.
The proposed algorithm first generates a good initial point via a dual alternating direction method of multipliers.
It then warms a semismooth Newton (SSN) based augmented Lagrangian method (ALM) to compute a solution that is accurate enough for practical tasks.
arXiv Detail & Related papers (2023-08-17T08:24:28Z) - Linearized Wasserstein dimensionality reduction with approximation
guarantees [65.16758672591365]
LOT Wassmap is a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space.
We show that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size.
We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
arXiv Detail & Related papers (2023-02-14T22:12:16Z) - Accelerating ERM for data-driven algorithm design using output-sensitive techniques [26.32088674030797]
We study techniques to develop efficient learning algorithms for data-driven algorithm design.
Our approach involves two novel ingredients -- an output-sensitive algorithm for enumerating polytopes induced by a set of hyperplanes.
We illustrate our techniques by giving algorithms for pricing problems, linkage-based clustering and dynamic-programming based sequence alignment.
arXiv Detail & Related papers (2022-04-07T17:27:18Z) - A Metaheuristic Algorithm for Large Maximum Weight Independent Set
Problems [58.348679046591265]
Given a node-weighted graph, find a set of independent (mutually nonadjacent) nodes whose node-weight sum is maximum.
Some of the graphs airsing in this application are large, having hundreds of thousands of nodes and hundreds of millions of edges.
We develop a new local search algorithm, which is a metaheuristic in the greedy randomized adaptive search framework.
arXiv Detail & Related papers (2022-03-28T21:34:16Z) - Provably Faster Algorithms for Bilevel Optimization [54.83583213812667]
Bilevel optimization has been widely applied in many important machine learning applications.
We propose two new algorithms for bilevel optimization.
We show that both algorithms achieve the complexity of $mathcalO(epsilon-1.5)$, which outperforms all existing algorithms by the order of magnitude.
arXiv Detail & Related papers (2021-06-08T21:05:30Z) - Fast Parallel Algorithms for Euclidean Minimum Spanning Tree and
Hierarchical Spatial Clustering [6.4805900740861]
We introduce a new notion of well-separation to reduce the work and space of our algorithm for HDBSCAN$*$.
We show that our algorithms are theoretically efficient: they have work (number of operations) matching their sequential counterparts, and polylogarithmic depth (parallel time)
Our experiments on large real-world and synthetic data sets using a 48-core machine show that our fastest algorithms outperform the best serial algorithms for the problems by 11.13--55.89x, and existing parallel algorithms by at least an order of magnitude.
arXiv Detail & Related papers (2021-04-02T16:05:00Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z) - Learning to Accelerate Heuristic Searching for Large-Scale Maximum
Weighted b-Matching Problems in Online Advertising [51.97494906131859]
Bipartite b-matching is fundamental in algorithm design, and has been widely applied into economic markets, labor markets, etc.
Existing exact and approximate algorithms usually fail in such settings due to either requiring intolerable running time or too much computation resource.
We propose textttNeuSearcher which leverages the knowledge learned from previously instances to solve new problem instances.
arXiv Detail & Related papers (2020-05-09T02:48:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.