Fast Single-Core K-Nearest Neighbor Graph Computation
- URL: http://arxiv.org/abs/2112.06630v1
- Date: Mon, 13 Dec 2021 13:16:30 GMT
- Title: Fast Single-Core K-Nearest Neighbor Graph Computation
- Authors: Dan Kluser, Jonas Bokstaller, Samuel Rutz and Tobias Buner
- Abstract summary: This paper presents a optimized C implementation of the runtime "NN-Descent" algorithm by Wei Dong et al.
Various implementation optimizations are explained which improve performance for low-dimensional as well as high dimensional datasets.
The restriction to the l2-distance metric allows for the use of blocked distance evaluations which significantly increase performance for high dimensional datasets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fast and reliable K-Nearest Neighbor Graph algorithms are more important than
ever due to their widespread use in many data processing techniques. This paper
presents a runtime optimized C implementation of the heuristic "NN-Descent"
algorithm by Wei Dong et al. for the l2-distance metric. Various implementation
optimizations are explained which improve performance for low-dimensional as
well as high dimensional datasets. Optimizations to speed up the selection of
which datapoint pairs to evaluate the distance for are primarily impactful for
low-dimensional datasets. A heuristic which exploits the iterative nature of
NN-Descent to reorder data in memory is presented which enables better use of
locality and thereby improves the runtime. The restriction to the l2-distance
metric allows for the use of blocked distance evaluations which significantly
increase performance for high dimensional datasets. In combination the
optimizations yield an implementation which significantly outperforms a widely
used implementation of NN-Descent on all considered datasets. For instance, the
runtime on the popular MNIST handwritten digits dataset is halved.
Related papers
- Approximate Nearest Neighbour Search on Dynamic Datasets: An Investigation [20.409659920455955]
Approximate k-Nearest Neighbour (ANN) methods are often used for mining information and aiding machine learning on large scale high-dimensional datasets.
For applications with static datasets, runtime constraints and dataset properties can be used to empirically select an ANN method with suitable operating characteristics.
Traditional evaluation approaches do not consider the computational costs of updating the index structure, as well as the rate and size of index updates.
arXiv Detail & Related papers (2024-04-30T06:21:44Z) - Efficient Heterogeneous Graph Learning via Random Projection [58.4138636866903]
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs.
Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors.
We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN)
arXiv Detail & Related papers (2023-10-23T01:25:44Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Alternately Optimized Graph Neural Networks [33.98939289745346]
We propose a new optimization framework for semi-supervised learning on graphs.
The proposed framework can be conveniently solved by the alternating optimization algorithms, resulting in significantly improved efficiency.
arXiv Detail & Related papers (2022-06-08T01:50:08Z) - Efficient Wind Speed Nowcasting with GPU-Accelerated Nearest Neighbors
Algorithm [0.0]
This paper proposes a simple yet efficient high-altitude wind nowcasting pipeline.
It processes efficiently a vast amount of live data recorded by airplanes over the whole airspace.
It creates a unique context for each point in the dataset and then extrapolates from it.
arXiv Detail & Related papers (2021-12-20T09:15:27Z) - Learning to Hash Robustly, with Guarantees [79.68057056103014]
In this paper, we design an NNS algorithm for the Hamming space that has worst-case guarantees essentially matching that of theoretical algorithms.
We evaluate the algorithm's ability to optimize for a given dataset both theoretically and practically.
Our algorithm has a 1.8x and 2.1x better recall on the worst-performing queries to the MNIST and ImageNet datasets.
arXiv Detail & Related papers (2021-08-11T20:21:30Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Analytical Characterization and Design Space Exploration for
Optimization of CNNs [10.15406080228806]
Loop-level optimization, including loop tiling and loop permutation, are fundamental transformations to reduce data movement.
This paper develops an analytical modeling approach for finding the best loop-level optimization configuration for CNNs on multi-core CPUs.
arXiv Detail & Related papers (2021-01-24T21:36:52Z) - Population Gradients improve performance across data-sets and
architectures in object classification [6.17047113475566]
We present a new method to calculate the gradients while training Neural Networks (NNs)
It significantly improves final performance across architectures, data-sets, hyper- parameter values, training length, and model sizes.
Besides being effective in the wide array situations that we have tested, the increase in performance (e.g. F1) is as high or higher than this one of all the other widespread performance-improving methods.
arXiv Detail & Related papers (2020-10-23T09:40:23Z) - Fast Graph Attention Networks Using Effective Resistance Based Graph
Sparsification [70.50751397870972]
FastGAT is a method to make attention based GNNs lightweight by using spectral sparsification to generate an optimal pruning of the input graph.
We experimentally evaluate FastGAT on several large real world graph datasets for node classification tasks.
arXiv Detail & Related papers (2020-06-15T22:07:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.