Related papers: PaVa: a novel Path-based Valley-seeking clustering algorithm

PaVa: a novel Path-based Valley-seeking clustering algorithm

URL: http://arxiv.org/abs/2306.07503v1
Date: Tue, 13 Jun 2023 02:29:34 GMT
Title: PaVa: a novel Path-based Valley-seeking clustering algorithm
Authors: Lin Ma and Conan Liu and Tiefeng Ma and Shuangzhe Liu
Abstract summary: We propose a novel Path-based Valley-seeking clustering algorithm for arbitrarily shaped clusters. Three vital techniques are used in this algorithm. The results indicate that the Path-based Valley-seeking algorithm is accurate and efficient.
Score: 13.264374632165776
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Clustering methods are being applied to a wider range of scenarios involving more complex datasets, where the shapes of clusters tend to be arbitrary. In this paper, we propose a novel Path-based Valley-seeking clustering algorithm for arbitrarily shaped clusters. This work aims to seek the valleys among clusters and then individually extract clusters. Three vital techniques are used in this algorithm. First, path distance (minmax distance) is employed to transform the irregular boundaries among clusters, that is density valleys, into perfect spherical shells. Second, a suitable density measurement, $k$-distance, is employed to make adjustment on Minimum Spanning Tree, by which a robust minmax distance is calculated. Third, we seek the transformed density valleys by determining their centers and radius. First, the clusters are wrapped in spherical shells after the distance transformation, making the extraction process efficient even with clusters of arbitrary shape. Second, adjusted Minimum Spanning Tree enhances the robustness of minmax distance under different kinds of noise. Last, the number of clusters does not need to be inputted or decided manually due to the individual extraction process. After applying the proposed algorithm to several commonly used synthetic datasets, the results indicate that the Path-based Valley-seeking algorithm is accurate and efficient. The algorithm is based on the dissimilarity of objects, so it can be applied to a wide range of fields. Its performance on real-world datasets illustrates its versatility.

Related papers

Adaptive $k$-nearest neighbor classifier based on the local estimation of the shape operator [49.87315310656657]
We introduce a new adaptive $k$-nearest neighbours ($kK$-NN) algorithm that explores the local curvature at a sample to adaptively defining the neighborhood size. Results on many real-world datasets indicate that the new $kK$-NN algorithm yields superior balanced accuracy compared to the established $k$-NN method.
arXiv Detail & Related papers (2024-09-08T13:08:45Z)
PECANN: Parallel Efficient Clustering with Graph-Based Approximate Nearest Neighbor Search [8.15681999722805]
This paper studies density-based clustering of point sets. It unifies the different variants of density peaks clustering into a single framework, PECANN. We implement five clustering algorithms with PECANN and evaluate them on synthetic and real-world datasets with up to 1.28 million points and up to 1024 dimensions on a 30-core machine with two-way hyper-threading.
arXiv Detail & Related papers (2023-12-06T22:43:50Z)
DenMune: Density peak based clustering using mutual nearest neighbors [0.0]
Many clustering algorithms fail when clusters are of arbitrary shapes, of varying densities, or the data classes are unbalanced and close to each other. A novel clustering algorithm, DenMune is presented to meet this challenge. It is based on identifying dense regions using mutual nearest neighborhoods of size K, where K is the only parameter required from the user.
arXiv Detail & Related papers (2023-09-23T16:18:00Z)
Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation. Specifically, we construct distance matrix between data points by Butterworth filter. To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z)
GBMST: An Efficient Minimum Spanning Tree Clustering Based on Granular-Ball Computing [78.92205914422925]
We propose a clustering algorithm that combines multi-granularity Granular-Ball and minimum spanning tree (MST) We construct coarsegrained granular-balls, and then use granular-balls and MST to implement the clustering method based on "large-scale priority" Experimental results on several data sets demonstrate the power of the algorithm.
arXiv Detail & Related papers (2023-03-02T09:04:35Z)
Efficient Graph Field Integrators Meet Point Clouds [59.27295475120132]
We present two new classes of algorithms for efficient field integration on graphs encoding point clouds. The first class, SeparatorFactorization(SF), leverages the bounded genus of point cloud mesh graphs, while the second class, RFDiffusion(RFD), uses popular epsilon-nearest-neighbor graph representations for point clouds.
arXiv Detail & Related papers (2023-02-02T08:33:36Z)
A Dynamical Systems Algorithm for Clustering in Hyperspectral Imagery [0.18374319565577152]
We present a new dynamical systems algorithm for clustering in hyperspectral images. The main idea of the algorithm is that data points are pushed' in the direction of increasing density and groups of pixels that end up in the same dense regions belong to the same class. We evaluate the algorithm on the Urban scene comparing performance against the k-means algorithm using pre-identified classes of materials as ground truth.
arXiv Detail & Related papers (2022-07-21T17:31:57Z)
A density peaks clustering algorithm with sparse search and K-d tree [16.141611031128427]
Density peaks clustering algorithm with sparse search and K-d tree is developed to solve this problem. Experiments are carried out on datasets with different distribution characteristics, by comparing with other five typical clustering algorithms.
arXiv Detail & Related papers (2022-03-02T09:29:40Z)
Learning Semantic Segmentation of Large-Scale Point Clouds with Random Sampling [52.464516118826765]
We introduce RandLA-Net, an efficient and lightweight neural architecture to infer per-point semantics for large-scale point clouds. The key to our approach is to use random point sampling instead of more complex point selection approaches. Our RandLA-Net can process 1 million points in a single pass up to 200x faster than existing approaches.
arXiv Detail & Related papers (2021-07-06T05:08:34Z)
Clustering of Big Data with Mixed Features [3.3504365823045044]
We develop a new clustering algorithm for large data of mixed type. The algorithm is capable of detecting outliers and clusters of relatively lower density values. We present experimental results to verify that our algorithm works well in practice.
arXiv Detail & Related papers (2020-11-11T19:54:38Z)
Ball k-means [53.89505717006118]
The Ball k-means algorithm uses a ball to describe a cluster, focusing on reducing the point-centroid distance computation. The fast speed, no extra parameters and simple design of the Ball k-means make it an all-around replacement of the naive k-means algorithm.
arXiv Detail & Related papers (2020-05-02T10:39:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.