Interpretable label-free self-guided subspace clustering
- URL: http://arxiv.org/abs/2411.17291v1
- Date: Tue, 26 Nov 2024 10:29:09 GMT
- Title: Interpretable label-free self-guided subspace clustering
- Authors: Ivica Kopriva,
- Abstract summary: Majority subspace clustering (SC) algorithms depend on one or more hyperparameters that need to be carefully tuned for the SC algorithms to achieve high clustering performance.
We propose a novel approach to label-independent HPO that uses clustering quality metrics, such as accuracy (ACC) or normalized mutual information (NMI)
We demonstrate this approach on several single- and multi-view SC algorithms, comparing the achieved performance with their oracle versions across six datasets representing digits, faces and objects.
- Score: 0.0
- License:
- Abstract: Majority subspace clustering (SC) algorithms depend on one or more hyperparameters that need to be carefully tuned for the SC algorithms to achieve high clustering performance. Hyperparameter optimization (HPO) is often performed using grid-search, assuming that some labeled data is available. In some domains, such as medicine, this assumption does not hold true in many cases. One avenue of research focuses on developing SC algorithms that are inherently free of hyperparameters. For hyperparameters-dependent SC algorithms, one approach to label-independent HPO tuning is based on internal clustering quality metrics (if available), whose performance should ideally match that of external (label-dependent) clustering quality metrics. In this paper, we propose a novel approach to label-independent HPO that uses clustering quality metrics, such as accuracy (ACC) or normalized mutual information (NMI), that are computed based on pseudo-labels obtained from the SC algorithm across a predefined grid of hyperparameters. Assuming that ACC (or NMI) is a smooth function of hyperparameter values it is possible to select subintervals of hyperparameters. These subintervals are then iteratively further split into halves or thirds until a relative error criterion is satisfied. In principle, the hyperparameters of any SC algorithm can be tuned using the proposed method. We demonstrate this approach on several single- and multi-view SC algorithms, comparing the achieved performance with their oracle versions across six datasets representing digits, faces and objects. The proposed method typically achieves clustering performance that is 5% to 7% lower than that of the oracle versions. We also make our proposed method interpretable by visualizing subspace bases, which are estimated from the computed clustering partitions. This aids in the initial selection of the hyperparameter search space.
Related papers
- Subspace Clustering in Wavelet Packets Domain [1.3812010983144802]
Subspace clustering (SC) algorithms utilize the union of subspaces model to cluster data points according to the subspaces from which they are drawn.
To better address separability of subspaces and robustness to noise we propose a wavelet packet (WP) based transform domain subspace clustering.
arXiv Detail & Related papers (2024-06-06T07:49:11Z) - MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence [97.93517982908007]
In cross-domain few-shot classification, NCC aims to learn representations to construct a metric space where few-shot classification can be performed.
In this paper, we find that there exist high similarities between NCC-learned representations of two samples from different classes.
We propose a bi-level optimization framework, emphmaximizing optimized kernel dependence (MOKD) to learn a set of class-specific representations that match the cluster structures indicated by labeled data.
arXiv Detail & Related papers (2024-05-29T05:59:52Z) - Superclustering by finding statistically significant separable groups of
optimal gaussian clusters [0.0]
The paper presents the algorithm for clustering a dataset by grouping the optimal, from the point of view of the BIC criterion.
An essential advantage of the algorithm is its ability to predict correct supercluster for new data based on already trained clusterer.
arXiv Detail & Related papers (2023-09-05T23:49:46Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - A One-shot Framework for Distributed Clustered Learning in Heterogeneous
Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments.
One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees.
For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z) - Applying Semi-Automated Hyperparameter Tuning for Clustering Algorithms [0.0]
This study proposes a framework for semi-automated hyperparameter tuning of clustering problems.
It uses a grid search to develop a series of graphs and easy to interpret metrics that can then be used for more efficient domain-specific evaluation.
Preliminary results show that internal metrics are unable to capture the semantic quality of the clusters developed.
arXiv Detail & Related papers (2021-08-25T05:48:06Z) - Cluster Representatives Selection in Non-Metric Spaces for Nearest
Prototype Classification [4.176752121302988]
In this paper, we present CRS, a novel method for selecting a small yet representative subset of objects as a cluster prototype.
Memory and computationally efficient selection of representatives is enabled by leveraging the similarity graph representation of each cluster created by the NN-Descent algorithm.
CRS can be used in an arbitrary metric or non-metric space because of the graph-based approach, which requires only a pairwise similarity measure.
arXiv Detail & Related papers (2021-07-03T04:51:07Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Stable and consistent density-based clustering via multiparameter
persistence [77.34726150561087]
We consider the degree-Rips construction from topological data analysis.
We analyze its stability to perturbations of the input data using the correspondence-interleaving distance.
We integrate these methods into a pipeline for density-based clustering, which we call Persistable.
arXiv Detail & Related papers (2020-05-18T19:45:04Z) - A Centroid Auto-Fused Hierarchical Fuzzy c-Means Clustering [30.709797128259236]
Centroid Auto-Fused Hierarchical Fuzzy c-means method (CAF-HFCM)
We present a Centroid Auto-Fused Hierarchical Fuzzy c-means method (CAF-HFCM) whose optimization procedure can automatically agglomerate to form a cluster hierarchy.
Our proposed CAF-HFCM method is able to be straightforwardly extended to various variants of FCM.
arXiv Detail & Related papers (2020-04-27T12:59:22Z) - Learnable Subspace Clustering [76.2352740039615]
We develop a learnable subspace clustering paradigm to efficiently solve the large-scale subspace clustering problem.
The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces.
To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods.
arXiv Detail & Related papers (2020-04-09T12:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.