Related papers: Scalable Co-Clustering for Large-Scale Data through Dynamic Partitioning and Hierarchical Merging

Scalable Co-Clustering for Large-Scale Data through Dynamic Partitioning and Hierarchical Merging

URL: http://arxiv.org/abs/2410.18113v1
Date: Wed, 09 Oct 2024 04:47:22 GMT
Title: Scalable Co-Clustering for Large-Scale Data through Dynamic Partitioning and Hierarchical Merging
Authors: Zihan Wu, Zhaoke Huang, Hong Yan,
Abstract summary: Co-clustering simultaneously clusters rows and columns, revealing more fine-grained groups. Existing co-clustering methods suffer from poor scalability and cannot handle large-scale data. This paper presents a novel and scalable co-clustering method designed to uncover intricate patterns in high-dimensional, large-scale datasets.
Score: 7.106620444966807
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Co-clustering simultaneously clusters rows and columns, revealing more fine-grained groups. However, existing co-clustering methods suffer from poor scalability and cannot handle large-scale data. This paper presents a novel and scalable co-clustering method designed to uncover intricate patterns in high-dimensional, large-scale datasets. Specifically, we first propose a large matrix partitioning algorithm that partitions a large matrix into smaller submatrices, enabling parallel co-clustering. This method employs a probabilistic model to optimize the configuration of submatrices, balancing the computational efficiency and depth of analysis. Additionally, we propose a hierarchical co-cluster merging algorithm that efficiently identifies and merges co-clusters from these submatrices, enhancing the robustness and reliability of the process. Extensive evaluations validate the effectiveness and efficiency of our method. Experimental results demonstrate a significant reduction in computation time, with an approximate 83% decrease for dense matrices and up to 30% for sparse matrices.

Related papers

An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks. The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions. We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z)
Deep Double Self-Expressive Subspace Clustering [7.875193047472789]
We propose a double self-expressive subspace clustering algorithm. The proposed algorithm can achieve better clustering than state-of-the-art methods.
arXiv Detail & Related papers (2023-06-20T15:10:35Z)
Late Fusion Multi-view Clustering via Global and Local Alignment Maximization [61.89218392703043]
Multi-view clustering (MVC) optimally integrates complementary information from different views to improve clustering performance. Most of existing approaches directly fuse multiple pre-specified similarities to learn an optimal similarity matrix for clustering. We propose late fusion MVC via alignment to address these issues.
arXiv Detail & Related papers (2022-08-02T01:49:31Z)
LSEC: Large-scale spectral ensemble clustering [8.545202841051582]
We propose a large-scale spectral ensemble clustering (LSEC) method to strike a good balance between efficiency and effectiveness. The LSEC method achieves a lower computational complexity than most existing ensemble clustering methods.
arXiv Detail & Related papers (2021-06-18T00:42:03Z)
Divide-and-conquer based Large-Scale Spectral Clustering [8.545202841051582]
We propose a divide-and-conquer based large-scale spectral clustering method to strike a good balance between efficiency and effectiveness. The proposed method achieves lower computational complexity than most existing large-scale spectral clustering.
arXiv Detail & Related papers (2021-04-30T15:09:45Z)
Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization. We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z)
Clustering Ensemble Meets Low-rank Tensor Approximation [50.21581880045667]
This paper explores the problem of clustering ensemble, which aims to combine multiple base clusterings to produce better performance than that of the individual one. We propose a novel low-rank tensor approximation-based method to solve the problem from a global perspective. Experimental results over 7 benchmark data sets show that the proposed model achieves a breakthrough in clustering performance, compared with 12 state-of-the-art methods.
arXiv Detail & Related papers (2020-12-16T13:01:37Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
Multi-View Spectral Clustering with High-Order Optimal Neighborhood Laplacian Matrix [57.11971786407279]
Multi-view spectral clustering can effectively reveal the intrinsic cluster structure among data. This paper proposes a multi-view spectral clustering algorithm that learns a high-order optimal neighborhood Laplacian matrix. Our proposed algorithm generates the optimal Laplacian matrix by searching the neighborhood of the linear combination of both the first-order and high-order base.
arXiv Detail & Related papers (2020-08-31T12:28:40Z)
Non-Exhaustive, Overlapping Co-Clustering: An Extended Analysis [32.15852903039789]
The goal of co-clustering is to simultaneously identify a clustering of rows as well as columns of a two dimensional data matrix. We develop an efficient iterative algorithm which we call the NEO-CC algorithm. Experimental results show that the NEO-CC algorithm is able to effectively capture the underlying co-clustering structure of real-world data.
arXiv Detail & Related papers (2020-04-24T04:39:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.