A new hashing based nearest neighbors selection technique for big
datasets
- URL: http://arxiv.org/abs/2004.02290v2
- Date: Wed, 10 Feb 2021 03:26:41 GMT
- Title: A new hashing based nearest neighbors selection technique for big
datasets
- Authors: Jude Tchaye-Kondi, Yanlong Zhai, Liehuang Zhu
- Abstract summary: This paper proposes a new technique that enables the selection of nearest neighbors directly in the neighborhood of a given observation.
The proposed approach consists of dividing the data space into subcells of a virtual grid built on top of data space.
Our algorithm outperforms the original KNN in time efficiency with a prediction quality as good as that of KNN.
- Score: 14.962398031252063
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: KNN has the reputation to be the word simplest but efficient supervised
learning algorithm used for either classification or regression. KNN prediction
efficiency highly depends on the size of its training data but when this
training data grows KNN suffers from slowness in making decisions since it
needs to search nearest neighbors within the entire dataset at each decision
making. This paper proposes a new technique that enables the selection of
nearest neighbors directly in the neighborhood of a given observation. The
proposed approach consists of dividing the data space into subcells of a
virtual grid built on top of data space. The mapping between the data points
and subcells is performed using hashing. When it comes to select the nearest
neighbors of a given observation, we firstly identify the cell the observation
belongs by using hashing, and then we look for nearest neighbors from that
central cell and cells around it layer by layer. From our experiment
performance analysis on publicly available datasets, our algorithm outperforms
the original KNN in time efficiency with a prediction quality as good as that
of KNN it also offers competitive performance with solutions like KDtree
Related papers
- Adaptive $k$-nearest neighbor classifier based on the local estimation of the shape operator [49.87315310656657]
We introduce a new adaptive $k$-nearest neighbours ($kK$-NN) algorithm that explores the local curvature at a sample to adaptively defining the neighborhood size.
Results on many real-world datasets indicate that the new $kK$-NN algorithm yields superior balanced accuracy compared to the established $k$-NN method.
arXiv Detail & Related papers (2024-09-08T13:08:45Z) - k-NNN: Nearest Neighbors of Neighbors for Anomaly Detection [20.204147875108976]
Anomaly detection aims at identifying images that deviate significantly from the norm.
We propose a new operator that takes into account the varying structure & importance of the features in the embedding space.
We show that by simply replacing the nearest neighbor component in existing algorithms by our k-NNN operator, while leaving the rest of the algorithms untouched, each algorithms own results are improved.
arXiv Detail & Related papers (2023-05-28T11:39:51Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - A Note on "Efficient Task-Specific Data Valuation for Nearest Neighbor
Algorithms" [18.65808473565554]
Jia et al. showed that for K-Nearest Neighbors (KNN) models, the algorithm of Data Shapley is surprisingly simple and efficient.
We propose a more natural and interpretable utility function that better reflects the performance of KNN models.
Our new approach, dubbed soft-label KNNSV, achieves the same time as the original method.
arXiv Detail & Related papers (2023-04-09T15:31:53Z) - Provable Data Subset Selection For Efficient Neural Network Training [73.34254513162898]
We introduce the first algorithm to construct coresets for emphRBFNNs, i.e., small weighted subsets that approximate the loss of the input data on any radial basis function network.
We then perform empirical evaluations on function approximation and dataset subset selection on popular network architectures and data sets.
arXiv Detail & Related papers (2023-03-09T10:08:34Z) - Optimal Extended Neighbourhood Rule $k$ Nearest Neighbours Ensemble [1.8843687952462742]
A new optimal extended neighborhood rule based ensemble method is proposed in this paper.
The ensemble is compared with state-of-the-art methods on 17 benchmark datasets using accuracy, Cohen's kappa, and Brier score (BS)
arXiv Detail & Related papers (2022-11-21T09:13:54Z) - Dynamic Ensemble Selection Using Fuzzy Hyperboxes [10.269997499911668]
This paper presents a new dynamic ensemble selection (DES) framework based on fuzzy hyperboxes called FH-DES.
Each hyperbox can represent a group of samples using only two data points (Min and Max corners)
For the first time, misclassified samples are used to estimate the competence of the classifiers, which has not been observed in previous fusion approaches.
arXiv Detail & Related papers (2022-05-20T21:06:46Z) - Rethinking Nearest Neighbors for Visual Classification [56.00783095670361]
k-NN is a lazy learning method that aggregates the distance between the test image and top-k neighbors in a training set.
We adopt k-NN with pre-trained visual representations produced by either supervised or self-supervised methods in two steps.
Via extensive experiments on a wide range of classification tasks, our study reveals the generality and flexibility of k-NN integration.
arXiv Detail & Related papers (2021-12-15T20:15:01Z) - Adaptive Nearest Neighbor Machine Translation [60.97183408140499]
kNN-MT combines pre-trained neural machine translation with token-level k-nearest-neighbor retrieval.
Traditional kNN algorithm simply retrieves a same number of nearest neighbors for each target token.
We propose Adaptive kNN-MT to dynamically determine the number of k for each target token.
arXiv Detail & Related papers (2021-05-27T09:27:42Z) - KNN Classification with One-step Computation [10.381276986079865]
A one-step computation is proposed to replace the lazy part of KNN classification.
The proposed approach is experimentally evaluated, and demonstrated that the one-step KNN classification is efficient and promising.
arXiv Detail & Related papers (2020-12-09T13:34:42Z) - Adversarial Examples for $k$-Nearest Neighbor Classifiers Based on
Higher-Order Voronoi Diagrams [69.4411417775822]
Adversarial examples are a widely studied phenomenon in machine learning models.
We propose an algorithm for evaluating the adversarial robustness of $k$-nearest neighbor classification.
arXiv Detail & Related papers (2020-11-19T08:49:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.