Related papers: Classification with Nearest Disjoint Centroids

Classification with Nearest Disjoint Centroids

URL: http://arxiv.org/abs/2109.10436v1
Date: Tue, 21 Sep 2021 21:16:36 GMT
Title: Classification with Nearest Disjoint Centroids
Authors: Nicolas Fraiman, Zichao Li
Abstract summary: We develop a new classification method based on nearest centroid, and it is called the nearest disjoint centroid classifier. Our method differs from the nearest centroid classifier in the following two aspects: (1) the centroids are defined based on disjoint subsets of features instead of all the features, and (2) the distance is induced by the dimensionality-normalized norm instead of the Euclidean norm.
Score: 6.332832782461923
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we develop a new classification method based on nearest centroid, and it is called the nearest disjoint centroid classifier. Our method differs from the nearest centroid classifier in the following two aspects: (1) the centroids are defined based on disjoint subsets of features instead of all the features, and (2) the distance is induced by the dimensionality-normalized norm instead of the Euclidean norm. We provide a few theoretical results regarding our method. In addition, we propose a simple algorithm based on adapted k-means clustering that can find the disjoint subsets of features used in our method, and extend the algorithm to perform feature selection. We evaluate and compare the performance of our method to other closely related classifiers on both simulated data and real-world gene expression datasets. The results demonstrate that our method is able to outperform other competing classifiers by having smaller misclassification rates and/or using fewer features in various settings and situations.

Related papers

Supervised Pattern Recognition Involving Skewed Feature Densities [49.48516314472825]
The classification potential of the Euclidean distance and a dissimilarity index based on the coincidence similarity index are compared. The accuracy of classifying the intersection point between the densities of two adjacent groups is taken into account.
arXiv Detail & Related papers (2024-09-02T12:45:18Z)
Canonical Variates in Wasserstein Metric Space [16.668946904062032]
We employ the Wasserstein metric to measure distances between distributions, which are then used by distance-based classification algorithms. Central to our investigation is dimension reduction within the Wasserstein metric space to enhance classification accuracy. We introduce a novel approach grounded in the principle of maximizing Fisher's ratio, defined as the quotient of between-class variation to within-class variation.
arXiv Detail & Related papers (2024-05-24T17:59:21Z)
Synergistic eigenanalysis of covariance and Hessian matrices for enhanced binary classification [72.77513633290056]
We present a novel approach that combines the eigenanalysis of a covariance matrix evaluated on a training set with a Hessian matrix evaluated on a deep learning model. Our method captures intricate patterns and relationships, enhancing classification performance.
arXiv Detail & Related papers (2024-02-14T16:10:42Z)
Feature Selection using Sparse Adaptive Bottleneck Centroid-Encoder [1.2487990897680423]
We introduce a novel nonlinear model, Sparse Adaptive Bottleneckid-Encoder (SABCE), for determining the features that discriminate between two or more classes. The algorithm is applied to various real-world data sets, including high-dimensional biological, image, speech, and accelerometer sensor data.
arXiv Detail & Related papers (2023-06-07T21:37:21Z)
Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation. Specifically, we construct distance matrix between data points by Butterworth filter. To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z)
Retrieval-Augmented Classification with Decoupled Representation [31.662843145399044]
We propose a $k$-nearest-neighbor (KNN)-based method for retrieval augmented classifications. We find that shared representation for classification and retrieval hurts performance and leads to training instability. We evaluate our method on a wide range of classification datasets.
arXiv Detail & Related papers (2023-03-23T06:33:06Z)
An enhanced method of initial cluster center selection for K-means algorithm [0.0]
We propose a novel approach to improve initial cluster selection for K-means algorithm. The Convex Hull algorithm facilitates the computing of the first two centroids and the remaining ones are selected according to the distance from previously selected centers. We obtained only 7.33%, 7.90%, and 0% clustering error in Iris, Letter, and Ruspini data respectively.
arXiv Detail & Related papers (2022-10-18T00:58:50Z)
Gradient Based Clustering [72.15857783681658]
We propose a general approach for distance based clustering, using the gradient of the cost function that measures clustering quality. The approach is an iterative two step procedure (alternating between cluster assignment and cluster center updates) and is applicable to a wide range of functions.
arXiv Detail & Related papers (2022-02-01T19:31:15Z)
Multivariate feature ranking of gene expression data [62.997667081978825]
We propose two new multivariate feature ranking methods based on pairwise correlation and pairwise consistency. We statistically prove that the proposed methods outperform the state of the art feature ranking methods Clustering Variation, Chi Squared, Correlation, Information Gain, ReliefF and Significance.
arXiv Detail & Related papers (2021-11-03T17:19:53Z)
Clustering with Penalty for Joint Occurrence of Objects: Computational Aspects [0.0]
The method of Hol'y, Sokol and vCern'y clusters objects based on their incidence in a large number of given sets. The idea is to minimize the occurrence of multiple objects from the same cluster in the same set. In the current paper, we study computational aspects of the method.
arXiv Detail & Related papers (2021-02-02T10:39:27Z)
Clustering Binary Data by Application of Combinatorial Optimization Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters. Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics. From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.