Ensemble Clustering via Co-association Matrix Self-enhancement
- URL: http://arxiv.org/abs/2205.05937v1
- Date: Thu, 12 May 2022 07:54:32 GMT
- Title: Ensemble Clustering via Co-association Matrix Self-enhancement
- Authors: Yuheng Jia, Sirui Tao, Ran Wang, Yongheng Wang
- Abstract summary: Ensemble clustering integrates a set of base clustering results to generate a stronger one.
Existing methods usually rely on a co-association (CA) matrix that measures how many times two samples are grouped into the same cluster.
We propose a simple yet effective CA matrix self-enhancement framework that can improve the CA matrix to achieve better clustering performance.
- Score: 16.928049559092454
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ensemble clustering integrates a set of base clustering results to generate a
stronger one. Existing methods usually rely on a co-association (CA) matrix
that measures how many times two samples are grouped into the same cluster
according to the base clusterings to achieve ensemble clustering. However, when
the constructed CA matrix is of low quality, the performance will degrade. In
this paper, we propose a simple yet effective CA matrix self-enhancement
framework that can improve the CA matrix to achieve better clustering
performance. Specifically, we first extract the high-confidence (HC)
information from the base clusterings to form a sparse HC matrix. By
propagating the highly-reliable information of the HC matrix to the CA matrix
and complementing the HC matrix according to the CA matrix simultaneously, the
proposed method generates an enhanced CA matrix for better clustering.
Technically, the proposed model is formulated as a symmetric constrained convex
optimization problem, which is efficiently solved by an alternating iterative
algorithm with convergence and global optimum theoretically guaranteed.
Extensive experimental comparisons with twelve state-of-the-art methods on
eight benchmark datasets substantiate the effectiveness, flexibility and
efficiency of the proposed model in ensemble clustering. The codes and datasets
can be downloaded at https://github.com/Siritao/EC-CMS.
Related papers
- Similarity and Dissimilarity Guided Co-association Matrix Construction for Ensemble Clustering [22.280221709474105]
We propose the Similarity and Dissimilarity Guided Co-association matrix (SDGCA) to achieve ensemble clustering.
First, we introduce normalized ensemble entropy to estimate the quality of each cluster, and construct a similarity matrix based on this estimation.
We employ the random walk to explore high-order proximity of base clusterings to construct a dissimilarity matrix.
arXiv Detail & Related papers (2024-11-01T08:10:28Z) - Scalable Co-Clustering for Large-Scale Data through Dynamic Partitioning and Hierarchical Merging [7.106620444966807]
Co-clustering simultaneously clusters rows and columns, revealing more fine-grained groups.
Existing co-clustering methods suffer from poor scalability and cannot handle large-scale data.
This paper presents a novel and scalable co-clustering method designed to uncover intricate patterns in high-dimensional, large-scale datasets.
arXiv Detail & Related papers (2024-10-09T04:47:22Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Deep Double Self-Expressive Subspace Clustering [7.875193047472789]
We propose a double self-expressive subspace clustering algorithm.
The proposed algorithm can achieve better clustering than state-of-the-art methods.
arXiv Detail & Related papers (2023-06-20T15:10:35Z) - Late Fusion Multi-view Clustering via Global and Local Alignment
Maximization [61.89218392703043]
Multi-view clustering (MVC) optimally integrates complementary information from different views to improve clustering performance.
Most of existing approaches directly fuse multiple pre-specified similarities to learn an optimal similarity matrix for clustering.
We propose late fusion MVC via alignment to address these issues.
arXiv Detail & Related papers (2022-08-02T01:49:31Z) - Semi-Supervised Subspace Clustering via Tensor Low-Rank Representation [64.49871502193477]
We propose a novel semi-supervised subspace clustering method, which is able to simultaneously augment the initial supervisory information and construct a discriminative affinity matrix.
Comprehensive experimental results on six commonly-used benchmark datasets demonstrate the superiority of our method over state-of-the-art methods.
arXiv Detail & Related papers (2022-05-21T01:47:17Z) - Clustering Ensemble Meets Low-rank Tensor Approximation [50.21581880045667]
This paper explores the problem of clustering ensemble, which aims to combine multiple base clusterings to produce better performance than that of the individual one.
We propose a novel low-rank tensor approximation-based method to solve the problem from a global perspective.
Experimental results over 7 benchmark data sets show that the proposed model achieves a breakthrough in clustering performance, compared with 12 state-of-the-art methods.
arXiv Detail & Related papers (2020-12-16T13:01:37Z) - Multi-View Spectral Clustering with High-Order Optimal Neighborhood
Laplacian Matrix [57.11971786407279]
Multi-view spectral clustering can effectively reveal the intrinsic cluster structure among data.
This paper proposes a multi-view spectral clustering algorithm that learns a high-order optimal neighborhood Laplacian matrix.
Our proposed algorithm generates the optimal Laplacian matrix by searching the neighborhood of the linear combination of both the first-order and high-order base.
arXiv Detail & Related papers (2020-08-31T12:28:40Z) - Clustering Binary Data by Application of Combinatorial Optimization
Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.