A Clustering Method Based on Information Entropy Payload
- URL: http://arxiv.org/abs/2209.06582v1
- Date: Tue, 13 Sep 2022 09:56:05 GMT
- Title: A Clustering Method Based on Information Entropy Payload
- Authors: Shaodong Deng, Long Sheng, Jiayi Nie, Fuyi Deng
- Abstract summary: Existing clustering algorithms such as K-means often need to preset parameters such as the number of K categories.
This paper introduces a clustering method based on the information theory, by which clusters in the clustering result have maximum average information entropy.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing clustering algorithms such as K-means often need to preset
parameters such as the number of categories K, and such parameters may lead to
the failure to output objective and consistent clustering results. This paper
introduces a clustering method based on the information theory, by which
clusters in the clustering result have maximum average information entropy
(called entropy payload in this paper). This method can bring the following
benefits: firstly, this method does not need to preset any super parameter such
as category number or other similar thresholds, secondly, the clustering
results have the maximum information expression efficiency. it can be used in
image segmentation, object classification, etc., and could be the basis of
unsupervised learning.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Interpretable Multi-View Clustering Based on Anchor Graph Tensor Factorization [64.00146569922028]
Multi-view clustering methods based on anchor graph factorization lack adequate cluster interpretability for the decomposed matrix.
We address this limitation by using non-negative tensor factorization to decompose an anchor graph tensor that combines anchor graphs from multiple views.
arXiv Detail & Related papers (2024-04-01T03:23:55Z) - Superclustering by finding statistically significant separable groups of
optimal gaussian clusters [0.0]
The paper presents the algorithm for clustering a dataset by grouping the optimal, from the point of view of the BIC criterion.
An essential advantage of the algorithm is its ability to predict correct supercluster for new data based on already trained clusterer.
arXiv Detail & Related papers (2023-09-05T23:49:46Z) - Revisiting Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [69.15976031704687]
We propose IAC (Instance-Adaptive Clustering), the first algorithm whose performance matches the instance-specific lower bounds both in expectation and with high probability.
IAC maintains an overall computational complexity of $ mathcalO(n, textpolylog(n) $, making it scalable and practical for large-scale problems.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - A One-shot Framework for Distributed Clustered Learning in Heterogeneous
Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments.
One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees.
For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Probabilistic Partitive Partitioning (PPP) [0.0]
Clustering algorithms, in general, face two common problems.
They converge to different settings with different initial conditions.
The number of clusters has to be arbitrarily decided beforehand.
arXiv Detail & Related papers (2020-03-09T19:18:35Z) - Supervised Enhanced Soft Subspace Clustering (SESSC) for TSK Fuzzy
Classifiers [25.32478253796209]
Fuzzy c-means based clustering algorithms are frequently used for Takagi-Sugeno-Kang (TSK) fuzzy classifier parameter estimation.
This paper proposes a supervised enhanced soft subspace clustering (SESSC) algorithm, which considers simultaneously the within-cluster compactness, between-cluster separation, and label information in clustering.
arXiv Detail & Related papers (2020-02-27T19:39:19Z) - Clustering Binary Data by Application of Combinatorial Optimization
Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z) - Variable feature weighted fuzzy k-means algorithm for high dimensional data [30.828627752648767]
In real-world applications, cluster-dependent feature weights help in partitioning the data set into more meaningful clusters.
This paper presents a novel fuzzy k-means clustering algorithm by modifying the objective function of the fuzzy k-means using two different entropy terms.
The method is validated using both supervised and unsupervised performance measures on real-world and synthetic datasets.
arXiv Detail & Related papers (2019-12-24T04:58:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.