Related papers: Clustering by Mining Density Distributions and Splitting Manifold Structure

Clustering by Mining Density Distributions and Splitting Manifold Structure

URL: http://arxiv.org/abs/2408.10493v1
Date: Tue, 20 Aug 2024 02:22:59 GMT
Title: Clustering by Mining Density Distributions and Splitting Manifold Structure
Authors: Zhichang Xu, Zhiguo Long, Hua Meng,
Abstract summary: A top-down approach was recently proposed to improve the efficiency of spectral clustering. This paper proposes to start from local structures to obtain micro-clusters. A novel similarity measure between micro-clusters is then proposed for the final spectral clustering.
Score: 2.3759432635713895
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Spectral clustering requires the time-consuming decomposition of the Laplacian matrix of the similarity graph, thus limiting its applicability to large datasets. To improve the efficiency of spectral clustering, a top-down approach was recently proposed, which first divides the data into several micro-clusters (granular-balls), then splits these micro-clusters when they are not "compact'', and finally uses these micro-clusters as nodes to construct a similarity graph for more efficient spectral clustering. However, this top-down approach is challenging to adapt to unevenly distributed or structurally complex data. This is because constructing micro-clusters as a rough ball struggles to capture the shape and structure of data in a local range, and the simplistic splitting rule that solely targets ``compactness'' is susceptible to noise and variations in data density and leads to micro-clusters with varying shapes, making it challenging to accurately measure the similarity between them. To resolve these issues, this paper first proposes to start from local structures to obtain micro-clusters, such that the complex structural information inside local neighborhoods is well captured by them. Moreover, by noting that Euclidean distance is more suitable for convex sets, this paper further proposes a data splitting rule that couples local density and data manifold structures, so that the similarities of the obtained micro-clusters can be easily characterized. A novel similarity measure between micro-clusters is then proposed for the final spectral clustering. A series of experiments based on synthetic and real-world datasets demonstrate that the proposed method has better adaptability to structurally complex data than granular-ball based methods.

Related papers

Scalable Context-Preserving Model-Aware Deep Clustering for Hyperspectral Images [51.95768218975529]
Subspace clustering has become widely adopted for the unsupervised analysis of hyperspectral images (HSIs)<n>Recent model-aware deep subspace clustering methods often use a two-stage framework, involving the calculation of a self-representation matrix with complexity of O(n2), followed by spectral clustering.<n>We propose a scalable, context-preserving deep clustering method based on basis representation, which jointly captures local and non-local structures for efficient HSI clustering.
arXiv Detail & Related papers (2025-06-12T16:43:09Z)
kFuse: A novel density based agglomerative clustering [9.061140802902514]
This paper introduces a density-based agglomerative clustering method, termed kFuse.<n> kFuse comprises four key components: (1) sub-cluster partitioning based on natural neighbors; (2) determination of boundary connectivity between sub-clusters through the computation of adjacent samples and shortest distances; and (3) assessment of density similarity between sub-clusters via the calculation of mean density and variance.<n> Experimental results on both synthetic and real-world datasets validate the effectiveness of kFuse.
arXiv Detail & Related papers (2025-05-09T03:11:04Z)
Manifold Clustering with Schatten p-norm Maximization [16.90743611125625]
We develop a new clustering framework based on manifold clustering. Specifically, the algorithm uses labels to guide the manifold structure and perform clustering on it. In order to naturally maintain the class balance in the clustering process, we maximize the Schatten p-norm of labels.
arXiv Detail & Related papers (2025-04-29T03:23:06Z)
Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks. We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z)
TANGO: Clustering with Typicality-Aware Nonlocal Mode-Seeking and Graph-Cut Optimization [2.4783546111391215]
Density-based clustering methods by mode-seeking usually achieve clustering by using local density estimation to mine structural information. We propose a new algorithm (TANGO) to establish local dependencies by exploiting a global-view emphtypicality of points. It achieves final clustering by employing graph-cut on sub-clusters, thus avoiding the challenging selection of cluster centers.
arXiv Detail & Related papers (2024-08-19T15:26:25Z)
Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets. In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem. This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z)
Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation. Specifically, we construct distance matrix between data points by Butterworth filter. To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z)
Kernel Biclustering algorithm in Hilbert Spaces [8.303238963864885]
We develop a new model-free biclustering algorithm in abstract spaces using the notions of energy distance and the maximum mean discrepancy. The proposed method can learn more general and complex cluster shapes than most existing literature approaches. Our results are similar to state-of-the-art methods in their optimal scenarios, assuming a proper kernel choice.
arXiv Detail & Related papers (2022-08-07T08:41:46Z)
flow-based clustering and spectral clustering: a comparison [0.688204255655161]
We study a novel graph clustering method for data with an intrinsic network structure. We exploit an intrinsic network structure of data to construct Euclidean feature vectors. Our results indicate that our clustering methods can cope with certain graph structures.
arXiv Detail & Related papers (2022-06-20T21:49:52Z)
Perfect Spectral Clustering with Discrete Covariates [68.8204255655161]
We propose a spectral algorithm that achieves perfect clustering with high probability on a class of large, sparse networks. Our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering.
arXiv Detail & Related papers (2022-05-17T01:41:06Z)
Kernel distance measures for time series, random fields and other structured data [71.61147615789537]
kdiff is a novel kernel-based measure for estimating distances between instances of structured data. It accounts for both self and cross similarities across the instances and is defined using a lower quantile of the distance distribution. Some theoretical results are provided for separability conditions using kdiff as a distance measure for clustering and classification problems.
arXiv Detail & Related papers (2021-09-29T22:54:17Z)
Tensor Laplacian Regularized Low-Rank Representation for Non-uniformly Distributed Data Subspace Clustering [2.578242050187029]
Low-Rank Representation (LRR) suffers from discarding the locality information of data points in subspace clustering. We propose a hypergraph model to facilitate having a variable number of adjacent nodes and incorporating the locality information of the data. Experiments on artificial and real datasets demonstrate the higher accuracy and precision of the proposed method.
arXiv Detail & Related papers (2021-03-06T08:22:24Z)
Clustering Ensemble Meets Low-rank Tensor Approximation [50.21581880045667]
This paper explores the problem of clustering ensemble, which aims to combine multiple base clusterings to produce better performance than that of the individual one. We propose a novel low-rank tensor approximation-based method to solve the problem from a global perspective. Experimental results over 7 benchmark data sets show that the proposed model achieves a breakthrough in clustering performance, compared with 12 state-of-the-art methods.
arXiv Detail & Related papers (2020-12-16T13:01:37Z)
Clustering small datasets in high-dimension by random projection [2.2940141855172027]
We propose a low-computation method to find statistically significant clustering structures in a small dataset. The method proceeds by projecting the data on a random line and seeking binary clusterings in the resulting one-dimensional data. The statistical validity of the clustering structures obtained is tested in the projected one-dimensional space.
arXiv Detail & Related papers (2020-08-21T16:49:37Z)
Clustering Binary Data by Application of Combinatorial Optimization Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters. Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics. From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.