On high-dimensional modifications of the nearest neighbor classifier
- URL: http://arxiv.org/abs/2407.05145v3
- Date: Thu, 24 Oct 2024 15:47:36 GMT
- Title: On high-dimensional modifications of the nearest neighbor classifier
- Authors: Annesha Ghosh, Deep Ghoshal, Bilol Banerjee, Anil K. Ghosh,
- Abstract summary: In this article, we discuss some of these existing methods and propose some new ones.
We analyze several simulated and benchmark datasets to compare the empirical performances of proposed methods with some of the existing ones.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nearest neighbor classifier is arguably the most simple and popular nonparametric classifier available in the literature. However, due to the concentration of pairwise distances and the violation of the neighborhood structure, this classifier often suffers in high-dimension, low-sample size (HDLSS) situations, especially when the scale difference between the competing classes dominates their location difference. Several attempts have been made in the literature to take care of this problem. In this article, we discuss some of these existing methods and propose some new ones. We carry out some theoretical investigations in this regard and analyze several simulated and benchmark datasets to compare the empirical performances of proposed methods with some of the existing ones.
Related papers
- Space Decomposition for Sentence Embedding [12.538707746802853]
This paper introduces a novel embedding space decomposition method called MixSP.
It is designed to distinguish and rank upper-range and lower-range samples accurately.
The experimental results demonstrate that MixSP decreased the overlap representation between upper-range and lower-range classes significantly.
arXiv Detail & Related papers (2024-06-05T10:20:10Z) - Classification Using Global and Local Mahalanobis Distances [1.7811840395202345]
We propose a novel semiparametric classifier based on Mahalanobis distances of an observation from the competing classes.
Our tool is a generalized additive model with the logistic link function that uses these distances as features to estimate the posterior probabilities of different classes.
arXiv Detail & Related papers (2024-02-13T08:22:42Z) - Intra-class Adaptive Augmentation with Neighbor Correction for Deep
Metric Learning [99.14132861655223]
We propose a novel intra-class adaptive augmentation (IAA) framework for deep metric learning.
We reasonably estimate intra-class variations for every class and generate adaptive synthetic samples to support hard samples mining.
Our method significantly improves and outperforms the state-of-the-art methods on retrieval performances by 3%-6%.
arXiv Detail & Related papers (2022-11-29T14:52:38Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - Centrality and Consistency: Two-Stage Clean Samples Identification for
Learning with Instance-Dependent Noisy Labels [87.48541631675889]
We propose a two-stage clean samples identification method.
First, we employ a class-level feature clustering procedure for the early identification of clean samples.
Second, for the remaining clean samples that are close to the ground truth class boundary, we propose a novel consistency-based classification method.
arXiv Detail & Related papers (2022-07-29T04:54:57Z) - Divide-and-Conquer Hard-thresholding Rules in High-dimensional
Imbalanced Classification [1.0312968200748118]
We study the impact of imbalance class sizes on the linear discriminant analysis (LDA) in high dimensions.
We show that due to data scarcity in one class, referred to as the minority class, the LDA ignores the minority class yielding a maximum misclassification rate.
We propose a new construction of a hard-conquering rule based on a divide-and-conquer technique that reduces the large difference between the misclassification rates.
arXiv Detail & Related papers (2021-11-05T07:44:28Z) - Adversarial Examples for $k$-Nearest Neighbor Classifiers Based on
Higher-Order Voronoi Diagrams [69.4411417775822]
Adversarial examples are a widely studied phenomenon in machine learning models.
We propose an algorithm for evaluating the adversarial robustness of $k$-nearest neighbor classification.
arXiv Detail & Related papers (2020-11-19T08:49:10Z) - Theoretical Insights Into Multiclass Classification: A High-dimensional
Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification.
Our analysis reveals that the classification accuracy is highly distribution-dependent.
The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z) - Multiclass Classification via Class-Weighted Nearest Neighbors [10.509405690286176]
We study statistical properties of the k-nearest neighbors algorithm for multiclass classification.
We derive upper and minimax lower bounds on accuracy, class-weighted risk, and uniform error.
arXiv Detail & Related papers (2020-04-09T17:50:16Z) - M2m: Imbalanced Classification via Major-to-minor Translation [79.09018382489506]
In most real-world scenarios, labeled training datasets are highly class-imbalanced, where deep neural networks suffer from generalizing to a balanced testing criterion.
In this paper, we explore a novel yet simple way to alleviate this issue by augmenting less-frequent classes via translating samples from more-frequent classes.
Our experimental results on a variety of class-imbalanced datasets show that the proposed method improves the generalization on minority classes significantly compared to other existing re-sampling or re-weighting methods.
arXiv Detail & Related papers (2020-04-01T13:21:17Z) - On a Generalization of the Average Distance Classifier [2.578242050187029]
We propose some simple transformations of the average distance classifier to tackle this issue.
The resulting classifiers perform quite well even when the underlying populations have the same location and scale.
Numerical experiments with a variety of simulated as well as real data sets exhibit the usefulness of the proposed methodology.
arXiv Detail & Related papers (2020-01-08T10:00:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.