Scalable Co-Clustering for Large-Scale Data through Dynamic Partitioning and Hierarchical Merging
- URL: http://arxiv.org/abs/2410.18113v1
- Date: Wed, 09 Oct 2024 04:47:22 GMT
- Title: Scalable Co-Clustering for Large-Scale Data through Dynamic Partitioning and Hierarchical Merging
- Authors: Zihan Wu, Zhaoke Huang, Hong Yan,
- Abstract summary: Co-clustering simultaneously clusters rows and columns, revealing more fine-grained groups.
Existing co-clustering methods suffer from poor scalability and cannot handle large-scale data.
This paper presents a novel and scalable co-clustering method designed to uncover intricate patterns in high-dimensional, large-scale datasets.
- Score: 7.106620444966807
- License:
- Abstract: Co-clustering simultaneously clusters rows and columns, revealing more fine-grained groups. However, existing co-clustering methods suffer from poor scalability and cannot handle large-scale data. This paper presents a novel and scalable co-clustering method designed to uncover intricate patterns in high-dimensional, large-scale datasets. Specifically, we first propose a large matrix partitioning algorithm that partitions a large matrix into smaller submatrices, enabling parallel co-clustering. This method employs a probabilistic model to optimize the configuration of submatrices, balancing the computational efficiency and depth of analysis. Additionally, we propose a hierarchical co-cluster merging algorithm that efficiently identifies and merges co-clusters from these submatrices, enhancing the robustness and reliability of the process. Extensive evaluations validate the effectiveness and efficiency of our method. Experimental results demonstrate a significant reduction in computation time, with an approximate 83% decrease for dense matrices and up to 30% for sparse matrices.
Related papers
- An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - Deep Double Self-Expressive Subspace Clustering [7.875193047472789]
We propose a double self-expressive subspace clustering algorithm.
The proposed algorithm can achieve better clustering than state-of-the-art methods.
arXiv Detail & Related papers (2023-06-20T15:10:35Z) - Late Fusion Multi-view Clustering via Global and Local Alignment
Maximization [61.89218392703043]
Multi-view clustering (MVC) optimally integrates complementary information from different views to improve clustering performance.
Most of existing approaches directly fuse multiple pre-specified similarities to learn an optimal similarity matrix for clustering.
We propose late fusion MVC via alignment to address these issues.
arXiv Detail & Related papers (2022-08-02T01:49:31Z) - LSEC: Large-scale spectral ensemble clustering [8.545202841051582]
We propose a large-scale spectral ensemble clustering (LSEC) method to strike a good balance between efficiency and effectiveness.
The LSEC method achieves a lower computational complexity than most existing ensemble clustering methods.
arXiv Detail & Related papers (2021-06-18T00:42:03Z) - Divide-and-conquer based Large-Scale Spectral Clustering [8.545202841051582]
We propose a divide-and-conquer based large-scale spectral clustering method to strike a good balance between efficiency and effectiveness.
The proposed method achieves lower computational complexity than most existing large-scale spectral clustering.
arXiv Detail & Related papers (2021-04-30T15:09:45Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Clustering Ensemble Meets Low-rank Tensor Approximation [50.21581880045667]
This paper explores the problem of clustering ensemble, which aims to combine multiple base clusterings to produce better performance than that of the individual one.
We propose a novel low-rank tensor approximation-based method to solve the problem from a global perspective.
Experimental results over 7 benchmark data sets show that the proposed model achieves a breakthrough in clustering performance, compared with 12 state-of-the-art methods.
arXiv Detail & Related papers (2020-12-16T13:01:37Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Multi-View Spectral Clustering with High-Order Optimal Neighborhood
Laplacian Matrix [57.11971786407279]
Multi-view spectral clustering can effectively reveal the intrinsic cluster structure among data.
This paper proposes a multi-view spectral clustering algorithm that learns a high-order optimal neighborhood Laplacian matrix.
Our proposed algorithm generates the optimal Laplacian matrix by searching the neighborhood of the linear combination of both the first-order and high-order base.
arXiv Detail & Related papers (2020-08-31T12:28:40Z) - Non-Exhaustive, Overlapping Co-Clustering: An Extended Analysis [32.15852903039789]
The goal of co-clustering is to simultaneously identify a clustering of rows as well as columns of a two dimensional data matrix.
We develop an efficient iterative algorithm which we call the NEO-CC algorithm.
Experimental results show that the NEO-CC algorithm is able to effectively capture the underlying co-clustering structure of real-world data.
arXiv Detail & Related papers (2020-04-24T04:39:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.