Clustering Ensemble Meets Low-rank Tensor Approximation
- URL: http://arxiv.org/abs/2012.08916v1
- Date: Wed, 16 Dec 2020 13:01:37 GMT
- Title: Clustering Ensemble Meets Low-rank Tensor Approximation
- Authors: Yuheng Jia, Hui Liu, Junhui Hou, Qingfu Zhang
- Abstract summary: This paper explores the problem of clustering ensemble, which aims to combine multiple base clusterings to produce better performance than that of the individual one.
We propose a novel low-rank tensor approximation-based method to solve the problem from a global perspective.
Experimental results over 7 benchmark data sets show that the proposed model achieves a breakthrough in clustering performance, compared with 12 state-of-the-art methods.
- Score: 50.21581880045667
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper explores the problem of clustering ensemble, which aims to combine
multiple base clusterings to produce better performance than that of the
individual one. The existing clustering ensemble methods generally construct a
co-association matrix, which indicates the pairwise similarity between samples,
as the weighted linear combination of the connective matrices from different
base clusterings, and the resulting co-association matrix is then adopted as
the input of an off-the-shelf clustering algorithm, e.g., spectral clustering.
However, the co-association matrix may be dominated by poor base clusterings,
resulting in inferior performance. In this paper, we propose a novel low-rank
tensor approximation-based method to solve the problem from a global
perspective. Specifically, by inspecting whether two samples are clustered to
an identical cluster under different base clusterings, we derive a
coherent-link matrix, which contains limited but highly reliable relationships
between samples. We then stack the coherent-link matrix and the co-association
matrix to form a three-dimensional tensor, the low-rankness property of which
is further explored to propagate the information of the coherent-link matrix to
the co-association matrix, producing a refined co-association matrix. We
formulate the proposed method as a convex constrained optimization problem and
solve it efficiently. Experimental results over 7 benchmark data sets show that
the proposed model achieves a breakthrough in clustering performance, compared
with 12 state-of-the-art methods. To the best of our knowledge, this is the
first work to explore the potential of low-rank tensor on clustering ensemble,
which is fundamentally different from previous approaches.
Related papers
- Similarity and Dissimilarity Guided Co-association Matrix Construction for Ensemble Clustering [22.280221709474105]
We propose the Similarity and Dissimilarity Guided Co-association matrix (SDGCA) to achieve ensemble clustering.
First, we introduce normalized ensemble entropy to estimate the quality of each cluster, and construct a similarity matrix based on this estimation.
We employ the random walk to explore high-order proximity of base clusterings to construct a dissimilarity matrix.
arXiv Detail & Related papers (2024-11-01T08:10:28Z) - Semi-Supervised Subspace Clustering via Tensor Low-Rank Representation [64.49871502193477]
We propose a novel semi-supervised subspace clustering method, which is able to simultaneously augment the initial supervisory information and construct a discriminative affinity matrix.
Comprehensive experimental results on six commonly-used benchmark datasets demonstrate the superiority of our method over state-of-the-art methods.
arXiv Detail & Related papers (2022-05-21T01:47:17Z) - Ensemble Clustering via Co-association Matrix Self-enhancement [16.928049559092454]
Ensemble clustering integrates a set of base clustering results to generate a stronger one.
Existing methods usually rely on a co-association (CA) matrix that measures how many times two samples are grouped into the same cluster.
We propose a simple yet effective CA matrix self-enhancement framework that can improve the CA matrix to achieve better clustering performance.
arXiv Detail & Related papers (2022-05-12T07:54:32Z) - Spatially Coherent Clustering Based on Orthogonal Nonnegative Matrix
Factorization [0.0]
We introduce in this work clustering models based on a total variation (TV) regularization procedure on the cluster membership matrix.
We provide a numerical evaluation of all proposed methods on a hyperspectral dataset obtained from a matrix-assisted laser desorption/ionisation imaging measurement.
arXiv Detail & Related papers (2021-04-25T23:40:41Z) - Doubly Stochastic Subspace Clustering [9.815735805354905]
Many state-of-the-art subspace clustering methods follow a two-step process by first constructing an affinity matrix between data points and then applying spectral clustering to this affinity.
In this work, we learn both a self-expressive representation of the data and an affinity matrix that is well-normalized for spectral clustering.
Experiments show that our method achieves state-of-the-art subspace clustering performance on many common datasets in computer vision.
arXiv Detail & Related papers (2020-11-30T14:56:54Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Biclustering with Alternating K-Means [5.089110111757978]
We provide a new formulation of the biclustering problem based on the idea of minimizing the empirical clustering risk.
We propose a simple and novel algorithm that finds a local minimum by alternating the use of an adapted version of the k-means clustering algorithm between columns and rows.
The results demonstrate that our algorithm is able to detect meaningful structures in the data and outperform other competing biclustering methods in various settings and situations.
arXiv Detail & Related papers (2020-09-09T20:15:24Z) - Multi-View Spectral Clustering with High-Order Optimal Neighborhood
Laplacian Matrix [57.11971786407279]
Multi-view spectral clustering can effectively reveal the intrinsic cluster structure among data.
This paper proposes a multi-view spectral clustering algorithm that learns a high-order optimal neighborhood Laplacian matrix.
Our proposed algorithm generates the optimal Laplacian matrix by searching the neighborhood of the linear combination of both the first-order and high-order base.
arXiv Detail & Related papers (2020-08-31T12:28:40Z) - Conjoined Dirichlet Process [63.89763375457853]
We develop a novel, non-parametric probabilistic biclustering method based on Dirichlet processes to identify biclusters with strong co-occurrence in both rows and columns.
We apply our method to two different applications, text mining and gene expression analysis, and demonstrate that our method improves bicluster extraction in many settings compared to existing approaches.
arXiv Detail & Related papers (2020-02-08T19:41:23Z) - Clustering Binary Data by Application of Combinatorial Optimization
Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.