Related papers: A Clustering Method Based on Information Entropy Payload

A Clustering Method Based on Information Entropy Payload

URL: http://arxiv.org/abs/2209.06582v1
Date: Tue, 13 Sep 2022 09:56:05 GMT
Title: A Clustering Method Based on Information Entropy Payload
Authors: Shaodong Deng, Long Sheng, Jiayi Nie, Fuyi Deng
Abstract summary: Existing clustering algorithms such as K-means often need to preset parameters such as the number of K categories. This paper introduces a clustering method based on the information theory, by which clusters in the clustering result have maximum average information entropy.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing clustering algorithms such as K-means often need to preset parameters such as the number of categories K, and such parameters may lead to the failure to output objective and consistent clustering results. This paper introduces a clustering method based on the information theory, by which clusters in the clustering result have maximum average information entropy (called entropy payload in this paper). This method can bring the following benefits: firstly, this method does not need to preset any super parameter such as category number or other similar thresholds, secondly, the clustering results have the maximum information expression efficiency. it can be used in image segmentation, object classification, etc., and could be the basis of unsupervised learning.

Related papers

K*-Means: A Parameter-free Clustering Algorithm [55.20132267309382]
k*-means is a novel clustering algorithm that eliminates the need to set k or any other parameters.<n>It uses the minimum description length principle to automatically determine the optimal number of clusters, k*, by splitting and merging clusters.<n>We prove that k*-means is guaranteed to converge and demonstrate experimentally that it significantly outperforms existing methods in scenarios where k is unknown.
arXiv Detail & Related papers (2025-05-17T08:41:07Z)
Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks. We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z)
Interpretable Multi-View Clustering Based on Anchor Graph Tensor Factorization [64.00146569922028]
Multi-view clustering methods based on anchor graph factorization lack adequate cluster interpretability for the decomposed matrix. We address this limitation by using non-negative tensor factorization to decompose an anchor graph tensor that combines anchor graphs from multiple views.
arXiv Detail & Related papers (2024-04-01T03:23:55Z)
Robust and Automatic Data Clustering: Dirichlet Process meets Median-of-Means [18.3248037914529]
We present an efficient and automatic clustering technique by integrating the principles of model-based and centroid-based methodologies. Statistical guarantees on the upper bound of clustering error suggest the advantages of our proposed method over existing state-of-the-art clustering algorithms.
arXiv Detail & Related papers (2023-11-26T19:01:15Z)
Superclustering by finding statistically significant separable groups of optimal gaussian clusters [0.0]
The paper presents the algorithm for clustering a dataset by grouping the optimal, from the point of view of the BIC criterion. An essential advantage of the algorithm is its ability to predict correct supercluster for new data based on already trained clusterer.
arXiv Detail & Related papers (2023-09-05T23:49:46Z)
Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z)
A One-shot Framework for Distributed Clustered Learning in Heterogeneous Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments. One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees. For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
Probabilistic Partitive Partitioning (PPP) [0.0]
Clustering algorithms, in general, face two common problems. They converge to different settings with different initial conditions. The number of clusters has to be arbitrarily decided beforehand.
arXiv Detail & Related papers (2020-03-09T19:18:35Z)
Supervised Enhanced Soft Subspace Clustering (SESSC) for TSK Fuzzy Classifiers [25.32478253796209]
Fuzzy c-means based clustering algorithms are frequently used for Takagi-Sugeno-Kang (TSK) fuzzy classifier parameter estimation. This paper proposes a supervised enhanced soft subspace clustering (SESSC) algorithm, which considers simultaneously the within-cluster compactness, between-cluster separation, and label information in clustering.
arXiv Detail & Related papers (2020-02-27T19:39:19Z)
Clustering Binary Data by Application of Combinatorial Optimization Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters. Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics. From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z)
Variable feature weighted fuzzy k-means algorithm for high dimensional data [30.828627752648767]
In real-world applications, cluster-dependent feature weights help in partitioning the data set into more meaningful clusters. This paper presents a novel fuzzy k-means clustering algorithm by modifying the objective function of the fuzzy k-means using two different entropy terms. The method is validated using both supervised and unsupervised performance measures on real-world and synthetic datasets.
arXiv Detail & Related papers (2019-12-24T04:58:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.