Cluster Representatives Selection in Non-Metric Spaces for Nearest
Prototype Classification
- URL: http://arxiv.org/abs/2107.01345v1
- Date: Sat, 3 Jul 2021 04:51:07 GMT
- Title: Cluster Representatives Selection in Non-Metric Spaces for Nearest
Prototype Classification
- Authors: Jaroslav Hlav\'a\v{c}, Martin Kopp, Jan Kohout
- Abstract summary: In this paper, we present CRS, a novel method for selecting a small yet representative subset of objects as a cluster prototype.
Memory and computationally efficient selection of representatives is enabled by leveraging the similarity graph representation of each cluster created by the NN-Descent algorithm.
CRS can be used in an arbitrary metric or non-metric space because of the graph-based approach, which requires only a pairwise similarity measure.
- Score: 4.176752121302988
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The nearest prototype classification is a less computationally intensive
replacement for the $k$-NN method, especially when large datasets are
considered. In metric spaces, centroids are often used as prototypes to
represent whole clusters. The selection of cluster prototypes in non-metric
spaces is more challenging as the idea of computing centroids is not directly
applicable.
In this paper, we present CRS, a novel method for selecting a small yet
representative subset of objects as a cluster prototype. Memory and
computationally efficient selection of representatives is enabled by leveraging
the similarity graph representation of each cluster created by the NN-Descent
algorithm. CRS can be used in an arbitrary metric or non-metric space because
of the graph-based approach, which requires only a pairwise similarity measure.
As we demonstrate in the experimental evaluation, our method outperforms the
state of the art techniques on multiple datasets from different domains.
Related papers
- MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence [97.93517982908007]
In cross-domain few-shot classification, NCC aims to learn representations to construct a metric space where few-shot classification can be performed.
In this paper, we find that there exist high similarities between NCC-learned representations of two samples from different classes.
We propose a bi-level optimization framework, emphmaximizing optimized kernel dependence (MOKD) to learn a set of class-specific representations that match the cluster structures indicated by labeled data.
arXiv Detail & Related papers (2024-05-29T05:59:52Z) - Beyond Known Clusters: Probe New Prototypes for Efficient Generalized Class Discovery [23.359450657842686]
Generalized Class Discovery (GCD) aims to dynamically assign labels to unlabelled data partially based on knowledge learned from labelled data.
We propose an adaptive probing mechanism that introduces learnable potential prototypes to expand cluster prototypes.
Our method surpasses the nearest competitor by a significant margin of 9.7% within the Stanford Cars dataset.
arXiv Detail & Related papers (2024-04-13T12:41:40Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Refining a $k$-nearest neighbor graph for a computationally efficient
spectral clustering [1.5469452301122175]
approximate spectral clustering (ASC) uses sampling or quantization to select data representatives.
We propose a refined version of $k$-nearest neighbor graph, in which we keep data points and aggressively reduce number of edges.
Compared to ASC methods, the proposed method delivered a consistent performance despite significant reduction of edges.
arXiv Detail & Related papers (2023-02-22T11:31:32Z) - An enhanced method of initial cluster center selection for K-means
algorithm [0.0]
We propose a novel approach to improve initial cluster selection for K-means algorithm.
The Convex Hull algorithm facilitates the computing of the first two centroids and the remaining ones are selected according to the distance from previously selected centers.
We obtained only 7.33%, 7.90%, and 0% clustering error in Iris, Letter, and Ruspini data respectively.
arXiv Detail & Related papers (2022-10-18T00:58:50Z) - IPD:An Incremental Prototype based DBSCAN for large-scale data with
cluster representatives [2.864550757598006]
We propose an Incremental Prototype-based DBSCAN (IPD) algorithm which is designed to identify arbitrary-shaped clusters for large-scale data.
In this paper, we propose an Incremental Prototype-based DBSCAN (IPD) algorithm which is designed to identify arbitrary-shaped clusters for large-scale data.
arXiv Detail & Related papers (2022-02-16T05:47:31Z) - Clustering Plotted Data by Image Segmentation [12.443102864446223]
Clustering algorithms are one of the main analytical methods to detect patterns in unlabeled data.
In this paper, we present a wholly different way of clustering points in 2-dimensional space, inspired by how humans cluster data.
Our approach, Visual Clustering, has several advantages over traditional clustering algorithms.
arXiv Detail & Related papers (2021-10-06T06:19:30Z) - Finding Geometric Models by Clustering in the Consensus Space [61.65661010039768]
We propose a new algorithm for finding an unknown number of geometric models, e.g., homographies.
We present a number of applications where the use of multiple geometric models improves accuracy.
These include pose estimation from multiple generalized homographies; trajectory estimation of fast-moving objects.
arXiv Detail & Related papers (2021-03-25T14:35:07Z) - Determinantal consensus clustering [77.34726150561087]
We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms.
DPPs favor diversity of the center points within subsets.
We show through simulations that, contrary to DPP, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets.
arXiv Detail & Related papers (2021-02-07T23:48:24Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Learnable Subspace Clustering [76.2352740039615]
We develop a learnable subspace clustering paradigm to efficiently solve the large-scale subspace clustering problem.
The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces.
To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods.
arXiv Detail & Related papers (2020-04-09T12:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.