Related papers: DEANN: Speeding up Kernel-Density Estimation using Approximate Nearest Neighbor Search

DEANN: Speeding up Kernel-Density Estimation using Approximate Nearest Neighbor Search

URL: http://arxiv.org/abs/2107.02736v1
Date: Tue, 6 Jul 2021 17:11:28 GMT
Title: DEANN: Speeding up Kernel-Density Estimation using Approximate Nearest Neighbor Search
Authors: Matti Karppa and Martin Aum\"uller and Rasmus Pagh
Abstract summary: We present an algorithm called Density Estimation from Approximate Nearest Neighbors (DEANN) We apply ANN algorithms as a black box subroutine to compute an unbiased Density Estimation (KDE) We show that our implementation outperforms state of the art implementations in all high dimensional datasets we considered.
Score: 8.25574589820305
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Kernel Density Estimation (KDE) is a nonparametric method for estimating the shape of a density function, given a set of samples from the distribution. Recently, locality-sensitive hashing, originally proposed as a tool for nearest neighbor search, has been shown to enable fast KDE data structures. However, these approaches do not take advantage of the many other advances that have been made in algorithms for nearest neighbor algorithms. We present an algorithm called Density Estimation from Approximate Nearest Neighbors (DEANN) where we apply Approximate Nearest Neighbor (ANN) algorithms as a black box subroutine to compute an unbiased KDE. The idea is to find points that have a large contribution to the KDE using ANN, compute their contribution exactly, and approximate the remainder with Random Sampling (RS). We present a theoretical argument that supports the idea that an ANN subroutine can speed up the evaluation. Furthermore, we provide a C++ implementation with a Python interface that can make use of an arbitrary ANN implementation as a subroutine for KDE evaluation. We show empirically that our implementation outperforms state of the art implementations in all high dimensional datasets we considered, and matches the performance of RS in cases where the ANN yield no gains in performance.

Related papers

Efficient Data-aware Distance Comparison Operations for High-Dimensional Approximate Nearest Neighbor Search [14.77572360618428]
High-dimensional approximate $K$ nearest neighbor search (AKNN) is a fundamental task for various applications. We propose an underlineData-underlineAware underlineDistance underlineEstimation approach, called emphDADE.
arXiv Detail & Related papers (2024-11-26T08:51:46Z)
PECANN: Parallel Efficient Clustering with Graph-Based Approximate Nearest Neighbor Search [8.15681999722805]
This paper studies density-based clustering of point sets. It unifies the different variants of density peaks clustering into a single framework, PECANN. We implement five clustering algorithms with PECANN and evaluate them on synthetic and real-world datasets with up to 1.28 million points and up to 1024 dimensions on a 30-core machine with two-way hyper-threading.
arXiv Detail & Related papers (2023-12-06T22:43:50Z)
Fast Approximation of Similarity Graphs with Kernel Density Estimation [12.321755440062732]
We present a new algorithm for constructing a similarity graph from a set $X$ of data points in $mathbbRd$. Our presented algorithm is based on the kernel density estimation problem, and is applicable for arbitrary kernel functions.
arXiv Detail & Related papers (2023-10-21T00:32:47Z)
Fast Private Kernel Density Estimation via Locality Sensitive Quantization [10.227538355037554]
We study efficient mechanisms for differentially private kernel density estimation (DP-KDE) We show how the kernel can privately be approximated in time linear in $d$, making it feasible for high-dimensional data.
arXiv Detail & Related papers (2023-07-04T18:48:04Z)
A Theoretical Analysis Of Nearest Neighbor Search On Approximate Near Neighbor Graph [51.880164098926166]
Graph-based algorithms have demonstrated state-of-the-art performance in the nearest neighbor search (NN-Search) problem. There exists a practice-to-theory gap in the graph-based NN-Search algorithms. We present theoretical guarantees of solving NN-Search via greedy search on ANN-Graph for low dimensional and dense vectors.
arXiv Detail & Related papers (2023-03-10T21:18:34Z)
Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel. Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU. Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z)
Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network. We design two optimal algorithms that attain these lower bounds. We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z)
Bayesian Optimistic Optimisation with Exponentially Decaying Regret [58.02542541410322]
The current practical BO algorithms have regret bounds ranging from $mathcalO(fraclogNsqrtN)$ to $mathcal O(e-sqrtN)$, where $N$ is the number of evaluations. This paper explores the possibility of improving the regret bound in the noiseless setting by intertwining concepts from BO and tree-based optimistic optimisation. We propose the BOO algorithm, a first practical approach which can achieve an exponential regret bound with order $mathcal O(N-sqrt
arXiv Detail & Related papers (2021-05-10T13:07:44Z)
Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network. We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z)
Leveraging Reinforcement Learning for evaluating Robustness of KNN Search Algorithms [0.0]
The problem of finding K-nearest neighbors in the given dataset for a given query point has been worked upon since several years. In this paper, we survey some novel K-Nearest Neighbor Search approaches that tackles the problem of Search from the perspectives of computations. In order to evaluate the robustness of a KNNS approach against adversarial points, we propose a generic Reinforcement Learning based framework for the same.
arXiv Detail & Related papers (2021-02-10T16:10:58Z)
Efficient Computation of Expectations under Spanning Tree Distributions [67.71280539312536]
We propose unified algorithms for the important cases of first-order expectations and second-order expectations in edge-factored, non-projective spanning-tree models. Our algorithms exploit a fundamental connection between gradients and expectations, which allows us to derive efficient algorithms.
arXiv Detail & Related papers (2020-08-29T14:58:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.