Applying Semi-Automated Hyperparameter Tuning for Clustering Algorithms
- URL: http://arxiv.org/abs/2108.11053v1
- Date: Wed, 25 Aug 2021 05:48:06 GMT
- Title: Applying Semi-Automated Hyperparameter Tuning for Clustering Algorithms
- Authors: Elizabeth Ditton, Anne Swinbourne, Trina Myers, Mitchell Scovell
- Abstract summary: This study proposes a framework for semi-automated hyperparameter tuning of clustering problems.
It uses a grid search to develop a series of graphs and easy to interpret metrics that can then be used for more efficient domain-specific evaluation.
Preliminary results show that internal metrics are unable to capture the semantic quality of the clusters developed.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: When approaching a clustering problem, choosing the right clustering
algorithm and parameters is essential, as each clustering algorithm is
proficient at finding clusters of a particular nature. Due to the unsupervised
nature of clustering algorithms, there are no ground truth values available for
empirical evaluation, which makes automation of the parameter selection process
through hyperparameter tuning difficult. Previous approaches to hyperparameter
tuning for clustering algorithms have relied on internal metrics, which are
often biased towards certain algorithms, or having some ground truth labels
available, moving the problem into the semi-supervised space. This preliminary
study proposes a framework for semi-automated hyperparameter tuning of
clustering problems, using a grid search to develop a series of graphs and easy
to interpret metrics that can then be used for more efficient domain-specific
evaluation. Preliminary results show that internal metrics are unable to
capture the semantic quality of the clusters developed and approaches driven by
internal metrics would come to different conclusions than those driven by
manual evaluation.
Related papers
- Interpretable label-free self-guided subspace clustering [0.0]
Majority subspace clustering (SC) algorithms depend on one or more hyperparameters that need to be carefully tuned for the SC algorithms to achieve high clustering performance.
We propose a novel approach to label-independent HPO that uses clustering quality metrics, such as accuracy (ACC) or normalized mutual information (NMI)
We demonstrate this approach on several single- and multi-view SC algorithms, comparing the achieved performance with their oracle versions across six datasets representing digits, faces and objects.
arXiv Detail & Related papers (2024-11-26T10:29:09Z) - Robust and Automatic Data Clustering: Dirichlet Process meets
Median-of-Means [18.3248037914529]
We present an efficient and automatic clustering technique by integrating the principles of model-based and centroid-based methodologies.
Statistical guarantees on the upper bound of clustering error suggest the advantages of our proposed method over existing state-of-the-art clustering algorithms.
arXiv Detail & Related papers (2023-11-26T19:01:15Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - A Parameter-free Adaptive Resonance Theory-based Topological Clustering
Algorithm Capable of Continual Learning [20.995946115633963]
We propose a new parameter-free ART-based topological clustering algorithm capable of continual learning by introducing parameter estimation methods.
Experimental results with synthetic and real-world datasets show that the proposed algorithm has superior clustering performance to the state-of-the-art clustering algorithms without any parameter pre-specifications.
arXiv Detail & Related papers (2023-05-01T01:04:07Z) - Hard Regularization to Prevent Deep Online Clustering Collapse without
Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed.
While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster.
We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z) - Gradient Based Clustering [72.15857783681658]
We propose a general approach for distance based clustering, using the gradient of the cost function that measures clustering quality.
The approach is an iterative two step procedure (alternating between cluster assignment and cluster center updates) and is applicable to a wide range of functions.
arXiv Detail & Related papers (2022-02-01T19:31:15Z) - Personalized Federated Learning via Convex Clustering [72.15857783681658]
We propose a family of algorithms for personalized federated learning with locally convex user costs.
The proposed framework is based on a generalization of convex clustering in which the differences between different users' models are penalized.
arXiv Detail & Related papers (2022-02-01T19:25:31Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Stable and consistent density-based clustering via multiparameter
persistence [77.34726150561087]
We consider the degree-Rips construction from topological data analysis.
We analyze its stability to perturbations of the input data using the correspondence-interleaving distance.
We integrate these methods into a pipeline for density-based clustering, which we call Persistable.
arXiv Detail & Related papers (2020-05-18T19:45:04Z) - A semi-supervised sparse K-Means algorithm [3.04585143845864]
An unsupervised sparse clustering method can be employed in order to detect the subgroup of features necessary for clustering.
A semi-supervised method can use the labelled data to create constraints and enhance the clustering solution.
We show that the algorithm maintains the high performance of other semi-supervised algorithms and in addition preserves the ability to identify informative from uninformative features.
arXiv Detail & Related papers (2020-03-16T02:05:23Z) - Simple and Scalable Sparse k-means Clustering via Feature Ranking [14.839931533868176]
We propose a novel framework for sparse k-means clustering that is intuitive, simple to implement, and competitive with state-of-the-art algorithms.
Our core method readily generalizes to several task-specific algorithms such as clustering on subsets of attributes and in partially observed data settings.
arXiv Detail & Related papers (2020-02-20T02:41:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.