A novel cluster internal evaluation index based on hyper-balls
- URL: http://arxiv.org/abs/2212.14524v1
- Date: Fri, 30 Dec 2022 02:56:40 GMT
- Title: A novel cluster internal evaluation index based on hyper-balls
- Authors: Jiang Xie, Pengfei Zhao, Shuyin Xia, Guoyin Wang, Dongdong Cheng
- Abstract summary: It is crucial to evaluate the quality and determine the optimal number of clusters in cluster analysis.
In this paper, the multi-granularity characterization of the data set is carried out to obtain the hyper-balls.
The cluster internal evaluation index based on hyper-balls(HCVI) is defined.
- Score: 11.048887848164268
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is crucial to evaluate the quality and determine the optimal number of
clusters in cluster analysis. In this paper, the multi-granularity
characterization of the data set is carried out to obtain the hyper-balls. The
cluster internal evaluation index based on hyper-balls(HCVI) is defined.
Moreover, a general method for determining the optimal number of clusters based
on HCVI is proposed. The proposed methods can evaluate the clustering results
produced by the several classic methods and determine the optimal cluster
number for data sets containing noises and clusters with arbitrary shapes. The
experimental results on synthetic and real data sets indicate that the new
index outperforms existing ones.
Related papers
- ABCDE: Application-Based Cluster Diff Evals [49.1574468325115]
It aims to be practical: it allows items to have associated importance values that are application-specific, it is frugal in its use of human judgements when determining which clustering is better, and it can report metrics for arbitrary slices of items.
The approach to measuring the delta in the clustering quality is novel: instead of trying to construct an expensive ground truth up front and evaluating the each clustering with respect to that, ABCDE samples questions for judgement on the basis of the actual diffs between the clusterings.
arXiv Detail & Related papers (2024-07-31T08:29:35Z) - From A-to-Z Review of Clustering Validation Indices [4.08908337437878]
We review and evaluate the performance of internal and external clustering validation indices on the most common clustering algorithms.
We suggest a classification framework for examining the functionality of both internal and external clustering validation measures.
arXiv Detail & Related papers (2024-07-18T13:52:02Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - A One-shot Framework for Distributed Clustered Learning in Heterogeneous
Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments.
One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees.
For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z) - Clustering performance analysis using new correlation based cluster
validity indices [0.0]
We develop two new cluster validity indices based on a correlation between an actual distance between a pair of data points and a centroid distance of clusters that the two points locate in.
Our proposed indices constantly yield several peaks at different numbers of clusters which overcome the weakness previously stated.
arXiv Detail & Related papers (2021-09-23T06:59:41Z) - The Three Ensemble Clustering (3EC) Algorithm for Pattern Discovery in
Unsupervised Learning [1.0465883970481493]
The 'Three Ensemble Clustering 3EC' algorithm classifies unlabeled data into quality clusters as a part of unsupervised learning.
Each partitioned cluster is considered to be a new data set and is a candidate to explore the most optimal algorithm.
The users can experiment with different sets of stopping criteria and choose the most'sensible group' of quality clusters.
arXiv Detail & Related papers (2021-07-08T10:15:18Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - A New Validity Index for Fuzzy-Possibilistic C-Means Clustering [6.174448419090291]
Fuzzy-Possibilistic (FP) index works well in the presence of clusters that vary in shape and density.
FPCM requires a priori selection of the degree of fuzziness and the degree of typicality.
arXiv Detail & Related papers (2020-05-19T01:48:13Z) - Clustering Binary Data by Application of Combinatorial Optimization
Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.