CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on
Multi-scale Data
- URL: http://arxiv.org/abs/2006.04435v1
- Date: Mon, 8 Jun 2020 09:46:35 GMT
- Title: CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on
Multi-scale Data
- Authors: Xiang Li, Ben Kao, Caihua Shan, Dawei Yin, Martin Ester
- Abstract summary: We study the problem of applying spectral clustering to cluster multi-scale data.
For multi-scale data, distance-based similarity is not effective because objects of a sparse cluster could be far apart.
We propose the algorithm CAST that applies trace Lasso to regularize the coefficient matrix.
- Score: 34.89460002735166
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of applying spectral clustering to cluster multi-scale
data, which is data whose clusters are of various sizes and densities.
Traditional spectral clustering techniques discover clusters by processing a
similarity matrix that reflects the proximity of objects. For multi-scale data,
distance-based similarity is not effective because objects of a sparse cluster
could be far apart while those of a dense cluster have to be sufficiently
close. Following [16], we solve the problem of spectral clustering on
multi-scale data by integrating the concept of objects' "reachability
similarity" with a given distance-based similarity to derive an objects'
coefficient matrix. We propose the algorithm CAST that applies trace Lasso to
regularize the coefficient matrix. We prove that the resulting coefficient
matrix has the "grouping effect" and that it exhibits "sparsity". We show that
these two characteristics imply very effective spectral clustering. We evaluate
CAST and 10 other clustering methods on a wide range of datasets w.r.t. various
measures. Experimental results show that CAST provides excellent performance
and is highly robust across test cases of multi-scale data.
Related papers
- Superclustering by finding statistically significant separable groups of
optimal gaussian clusters [0.0]
The paper presents the algorithm for clustering a dataset by grouping the optimal, from the point of view of the BIC criterion.
An essential advantage of the algorithm is its ability to predict correct supercluster for new data based on already trained clusterer.
arXiv Detail & Related papers (2023-09-05T23:49:46Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - LSEC: Large-scale spectral ensemble clustering [8.545202841051582]
We propose a large-scale spectral ensemble clustering (LSEC) method to strike a good balance between efficiency and effectiveness.
The LSEC method achieves a lower computational complexity than most existing ensemble clustering methods.
arXiv Detail & Related papers (2021-06-18T00:42:03Z) - Divide-and-conquer based Large-Scale Spectral Clustering [8.545202841051582]
We propose a divide-and-conquer based large-scale spectral clustering method to strike a good balance between efficiency and effectiveness.
The proposed method achieves lower computational complexity than most existing large-scale spectral clustering.
arXiv Detail & Related papers (2021-04-30T15:09:45Z) - Multiscale Clustering of Hyperspectral Images Through Spectral-Spatial
Diffusion Geometry [9.619814126465206]
Clustering algorithms partition a dataset into groups of similar points.
The primary contribution of this article is the Multiscale Spatially-Regularized Diffusion Learning (M-SRDL) clustering algorithm.
We show that incorporating spatial regularization into a multiscale clustering framework corresponds to smoother and more coherent clusters when applied to HSI data.
arXiv Detail & Related papers (2021-03-29T17:24:28Z) - Clustering Ensemble Meets Low-rank Tensor Approximation [50.21581880045667]
This paper explores the problem of clustering ensemble, which aims to combine multiple base clusterings to produce better performance than that of the individual one.
We propose a novel low-rank tensor approximation-based method to solve the problem from a global perspective.
Experimental results over 7 benchmark data sets show that the proposed model achieves a breakthrough in clustering performance, compared with 12 state-of-the-art methods.
arXiv Detail & Related papers (2020-12-16T13:01:37Z) - Spectral Clustering with Smooth Tiny Clusters [14.483043753721256]
We propose a novel clustering algorithm, which con-siders the smoothness of data for the first time.
Our key idea is to cluster tiny clusters, whose centers constitute smooth graphs.
Although in this paper, we singly focus on multi-scale situations, the idea of data smoothness can certainly be extended to any clustering algorithms.
arXiv Detail & Related papers (2020-09-10T05:21:20Z) - Multi-View Spectral Clustering with High-Order Optimal Neighborhood
Laplacian Matrix [57.11971786407279]
Multi-view spectral clustering can effectively reveal the intrinsic cluster structure among data.
This paper proposes a multi-view spectral clustering algorithm that learns a high-order optimal neighborhood Laplacian matrix.
Our proposed algorithm generates the optimal Laplacian matrix by searching the neighborhood of the linear combination of both the first-order and high-order base.
arXiv Detail & Related papers (2020-08-31T12:28:40Z) - Conjoined Dirichlet Process [63.89763375457853]
We develop a novel, non-parametric probabilistic biclustering method based on Dirichlet processes to identify biclusters with strong co-occurrence in both rows and columns.
We apply our method to two different applications, text mining and gene expression analysis, and demonstrate that our method improves bicluster extraction in many settings compared to existing approaches.
arXiv Detail & Related papers (2020-02-08T19:41:23Z) - Clustering Binary Data by Application of Combinatorial Optimization
Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.