Divide-and-conquer based Large-Scale Spectral Clustering
- URL: http://arxiv.org/abs/2104.15042v1
- Date: Fri, 30 Apr 2021 15:09:45 GMT
- Title: Divide-and-conquer based Large-Scale Spectral Clustering
- Authors: Hongmin Li, Xiucai Ye, Akira Imakura and Tetsuya Sakurai
- Abstract summary: We propose a divide-and-conquer based large-scale spectral clustering method to strike a good balance between efficiency and effectiveness.
The proposed method achieves lower computational complexity than most existing large-scale spectral clustering.
- Score: 8.545202841051582
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spectral clustering is one of the most popular clustering methods. However,
how to balance the efficiency and effectiveness of the large-scale spectral
clustering with limited computing resources has not been properly solved for a
long time. In this paper, we propose a divide-and-conquer based large-scale
spectral clustering method to strike a good balance between efficiency and
effectiveness. In the proposed method, a divide-and-conquer based landmark
selection algorithm and a novel approximate similarity matrix approach are
designed to construct a sparse similarity matrix within extremely low cost.
Then clustering results can be computed quickly through a bipartite graph
partition process. The proposed method achieves the lower computational
complexity than most existing large-scale spectral clustering. Experimental
results on ten large-scale datasets have demonstrated the efficiency and
effectiveness of the proposed methods. The MATLAB code of the proposed method
and experimental datasets are available at
https://github.com/Li-Hongmin/MyPaperWithCode.
Related papers
- Scalable Co-Clustering for Large-Scale Data through Dynamic Partitioning and Hierarchical Merging [7.106620444966807]
Co-clustering simultaneously clusters rows and columns, revealing more fine-grained groups.
Existing co-clustering methods suffer from poor scalability and cannot handle large-scale data.
This paper presents a novel and scalable co-clustering method designed to uncover intricate patterns in high-dimensional, large-scale datasets.
arXiv Detail & Related papers (2024-10-09T04:47:22Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - GBMST: An Efficient Minimum Spanning Tree Clustering Based on
Granular-Ball Computing [78.92205914422925]
We propose a clustering algorithm that combines multi-granularity Granular-Ball and minimum spanning tree (MST)
We construct coarsegrained granular-balls, and then use granular-balls and MST to implement the clustering method based on "large-scale priority"
Experimental results on several data sets demonstrate the power of the algorithm.
arXiv Detail & Related papers (2023-03-02T09:04:35Z) - Scalable Clustering: Large Scale Unsupervised Learning of Gaussian
Mixture Models with Outliers [5.478764356647437]
This paper introduces a provably robust clustering algorithm based on loss minimization.
It provides theoretical guarantees that the algorithm obtains high accuracy with high probability.
Experiments on real-world large-scale datasets demonstrate the effectiveness of the algorithm.
arXiv Detail & Related papers (2023-02-28T14:39:18Z) - LSEC: Large-scale spectral ensemble clustering [8.545202841051582]
We propose a large-scale spectral ensemble clustering (LSEC) method to strike a good balance between efficiency and effectiveness.
The LSEC method achieves a lower computational complexity than most existing ensemble clustering methods.
arXiv Detail & Related papers (2021-06-18T00:42:03Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Multi-View Spectral Clustering with High-Order Optimal Neighborhood
Laplacian Matrix [57.11971786407279]
Multi-view spectral clustering can effectively reveal the intrinsic cluster structure among data.
This paper proposes a multi-view spectral clustering algorithm that learns a high-order optimal neighborhood Laplacian matrix.
Our proposed algorithm generates the optimal Laplacian matrix by searching the neighborhood of the linear combination of both the first-order and high-order base.
arXiv Detail & Related papers (2020-08-31T12:28:40Z) - Spectral Clustering using Eigenspectrum Shape Based Nystrom Sampling [19.675277307158435]
This paper proposes a scalable Nystrom-based clustering algorithm with a new sampling procedure, Centroid Minimum Sum of Squared Similarities (CMS3), and a on when to use it.
Our datasets depends on the eigen spectrum shape of the dataset, and yields competitive low-rank approximations in test compared to the other state-of-the-art methods.
arXiv Detail & Related papers (2020-07-21T17:49:03Z) - CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on
Multi-scale Data [34.89460002735166]
We study the problem of applying spectral clustering to cluster multi-scale data.
For multi-scale data, distance-based similarity is not effective because objects of a sparse cluster could be far apart.
We propose the algorithm CAST that applies trace Lasso to regularize the coefficient matrix.
arXiv Detail & Related papers (2020-06-08T09:46:35Z) - Learnable Subspace Clustering [76.2352740039615]
We develop a learnable subspace clustering paradigm to efficiently solve the large-scale subspace clustering problem.
The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces.
To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods.
arXiv Detail & Related papers (2020-04-09T12:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.