Clustering performance analysis using new correlation based cluster
validity indices
- URL: http://arxiv.org/abs/2109.11172v1
- Date: Thu, 23 Sep 2021 06:59:41 GMT
- Title: Clustering performance analysis using new correlation based cluster
validity indices
- Authors: Nathakhun Wiroonsri
- Abstract summary: We develop two new cluster validity indices based on a correlation between an actual distance between a pair of data points and a centroid distance of clusters that the two points locate in.
Our proposed indices constantly yield several peaks at different numbers of clusters which overcome the weakness previously stated.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There are various cluster validity measures used for evaluating clustering
results. One of the main objective of using these measures is to seek the
optimal unknown number of clusters. Some measures work well for clusters with
different densities, sizes and shapes. Yet, one of the weakness that those
validity measures share is that they sometimes provide only one clear optimal
number of clusters. That number is actually unknown and there might be more
than one potential sub-optimal options that a user may wish to choose based on
different applications. We develop two new cluster validity indices based on a
correlation between an actual distance between a pair of data points and a
centroid distance of clusters that the two points locate in. Our proposed
indices constantly yield several peaks at different numbers of clusters which
overcome the weakness previously stated. Furthermore, the introduced
correlation can also be used for evaluating the quality of a selected
clustering result. Several experiments in different scenarios including the
well-known iris data set and a real-world marketing application have been
conducted in order to compare the proposed validity indices with several
well-known ones.
Related papers
- Dying Clusters Is All You Need -- Deep Clustering With an Unknown Number of Clusters [5.507296054825372]
Finding meaningful groups in high-dimensional data is an important challenge in data mining.
Deep clustering methods have achieved remarkable results in these tasks.
Most of these methods require the user to specify the number of clusters in advance.
This is a major limitation since the number of clusters is typically unknown if labeled data is unavailable.
Most of these approaches estimate the number of clusters separated from the clustering process.
arXiv Detail & Related papers (2024-10-12T11:04:10Z) - ABCDE: Application-Based Cluster Diff Evals [49.1574468325115]
It aims to be practical: it allows items to have associated importance values that are application-specific, it is frugal in its use of human judgements when determining which clustering is better, and it can report metrics for arbitrary slices of items.
The approach to measuring the delta in the clustering quality is novel: instead of trying to construct an expensive ground truth up front and evaluating the each clustering with respect to that, ABCDE samples questions for judgement on the basis of the actual diffs between the clusterings.
arXiv Detail & Related papers (2024-07-31T08:29:35Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - A novel cluster internal evaluation index based on hyper-balls [11.048887848164268]
It is crucial to evaluate the quality and determine the optimal number of clusters in cluster analysis.
In this paper, the multi-granularity characterization of the data set is carried out to obtain the hyper-balls.
The cluster internal evaluation index based on hyper-balls(HCVI) is defined.
arXiv Detail & Related papers (2022-12-30T02:56:40Z) - A new nonparametric interpoint distance-based measure for assessment of
clustering [0.0]
A new interpoint distance-based measure is proposed to identify the optimal number of clusters present in a data set.
Our proposed criterion is compatible with any clustering algorithm, and can be used to determine the unknown number of clusters.
arXiv Detail & Related papers (2022-10-01T04:27:54Z) - A One-shot Framework for Distributed Clustered Learning in Heterogeneous
Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments.
One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees.
For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z) - Anomaly Clustering: Grouping Images into Coherent Clusters of Anomaly
Types [60.45942774425782]
We introduce anomaly clustering, whose goal is to group data into coherent clusters of anomaly types.
This is different from anomaly detection, whose goal is to divide anomalies from normal data.
We present a simple yet effective clustering framework using a patch-based pretrained deep embeddings and off-the-shelf clustering methods.
arXiv Detail & Related papers (2021-12-21T23:11:33Z) - Selecting the number of clusters, clustering models, and algorithms. A
unifying approach based on the quadratic discriminant score [0.5330240017302619]
We propose a selection rule that allows choosing among many clustering solutions.
The proposed method has the distinctive advantage that it can compare partitions that cannot be compared with other state-of-the-art methods.
arXiv Detail & Related papers (2021-11-03T15:38:58Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - A New Validity Index for Fuzzy-Possibilistic C-Means Clustering [6.174448419090291]
Fuzzy-Possibilistic (FP) index works well in the presence of clusters that vary in shape and density.
FPCM requires a priori selection of the degree of fuzziness and the degree of typicality.
arXiv Detail & Related papers (2020-05-19T01:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.