PaVa: a novel Path-based Valley-seeking clustering algorithm
- URL: http://arxiv.org/abs/2306.07503v1
- Date: Tue, 13 Jun 2023 02:29:34 GMT
- Title: PaVa: a novel Path-based Valley-seeking clustering algorithm
- Authors: Lin Ma and Conan Liu and Tiefeng Ma and Shuangzhe Liu
- Abstract summary: We propose a novel Path-based Valley-seeking clustering algorithm for arbitrarily shaped clusters.
Three vital techniques are used in this algorithm.
The results indicate that the Path-based Valley-seeking algorithm is accurate and efficient.
- Score: 13.264374632165776
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clustering methods are being applied to a wider range of scenarios involving
more complex datasets, where the shapes of clusters tend to be arbitrary. In
this paper, we propose a novel Path-based Valley-seeking clustering algorithm
for arbitrarily shaped clusters. This work aims to seek the valleys among
clusters and then individually extract clusters. Three vital techniques are
used in this algorithm. First, path distance (minmax distance) is employed to
transform the irregular boundaries among clusters, that is density valleys,
into perfect spherical shells. Second, a suitable density measurement,
$k$-distance, is employed to make adjustment on Minimum Spanning Tree, by which
a robust minmax distance is calculated. Third, we seek the transformed density
valleys by determining their centers and radius. First, the clusters are
wrapped in spherical shells after the distance transformation, making the
extraction process efficient even with clusters of arbitrary shape. Second,
adjusted Minimum Spanning Tree enhances the robustness of minmax distance under
different kinds of noise. Last, the number of clusters does not need to be
inputted or decided manually due to the individual extraction process. After
applying the proposed algorithm to several commonly used synthetic datasets,
the results indicate that the Path-based Valley-seeking algorithm is accurate
and efficient. The algorithm is based on the dissimilarity of objects, so it
can be applied to a wide range of fields. Its performance on real-world
datasets illustrates its versatility.
Related papers
- Adaptive $k$-nearest neighbor classifier based on the local estimation of the shape operator [49.87315310656657]
We introduce a new adaptive $k$-nearest neighbours ($kK$-NN) algorithm that explores the local curvature at a sample to adaptively defining the neighborhood size.
Results on many real-world datasets indicate that the new $kK$-NN algorithm yields superior balanced accuracy compared to the established $k$-NN method.
arXiv Detail & Related papers (2024-09-08T13:08:45Z) - PECANN: Parallel Efficient Clustering with Graph-Based Approximate
Nearest Neighbor Search [8.15681999722805]
This paper studies density-based clustering of point sets.
It unifies the different variants of density peaks clustering into a single framework, PECANN.
We implement five clustering algorithms with PECANN and evaluate them on synthetic and real-world datasets with up to 1.28 million points and up to 1024 dimensions on a 30-core machine with two-way hyper-threading.
arXiv Detail & Related papers (2023-12-06T22:43:50Z) - DenMune: Density peak based clustering using mutual nearest neighbors [0.0]
Many clustering algorithms fail when clusters are of arbitrary shapes, of varying densities, or the data classes are unbalanced and close to each other.
A novel clustering algorithm, DenMune is presented to meet this challenge.
It is based on identifying dense regions using mutual nearest neighborhoods of size K, where K is the only parameter required from the user.
arXiv Detail & Related papers (2023-09-23T16:18:00Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - GBMST: An Efficient Minimum Spanning Tree Clustering Based on
Granular-Ball Computing [78.92205914422925]
We propose a clustering algorithm that combines multi-granularity Granular-Ball and minimum spanning tree (MST)
We construct coarsegrained granular-balls, and then use granular-balls and MST to implement the clustering method based on "large-scale priority"
Experimental results on several data sets demonstrate the power of the algorithm.
arXiv Detail & Related papers (2023-03-02T09:04:35Z) - Efficient Graph Field Integrators Meet Point Clouds [59.27295475120132]
We present two new classes of algorithms for efficient field integration on graphs encoding point clouds.
The first class, SeparatorFactorization(SF), leverages the bounded genus of point cloud mesh graphs, while the second class, RFDiffusion(RFD), uses popular epsilon-nearest-neighbor graph representations for point clouds.
arXiv Detail & Related papers (2023-02-02T08:33:36Z) - A Dynamical Systems Algorithm for Clustering in Hyperspectral Imagery [0.18374319565577152]
We present a new dynamical systems algorithm for clustering in hyperspectral images.
The main idea of the algorithm is that data points are pushed' in the direction of increasing density and groups of pixels that end up in the same dense regions belong to the same class.
We evaluate the algorithm on the Urban scene comparing performance against the k-means algorithm using pre-identified classes of materials as ground truth.
arXiv Detail & Related papers (2022-07-21T17:31:57Z) - A density peaks clustering algorithm with sparse search and K-d tree [16.141611031128427]
Density peaks clustering algorithm with sparse search and K-d tree is developed to solve this problem.
Experiments are carried out on datasets with different distribution characteristics, by comparing with other five typical clustering algorithms.
arXiv Detail & Related papers (2022-03-02T09:29:40Z) - Learning Semantic Segmentation of Large-Scale Point Clouds with Random
Sampling [52.464516118826765]
We introduce RandLA-Net, an efficient and lightweight neural architecture to infer per-point semantics for large-scale point clouds.
The key to our approach is to use random point sampling instead of more complex point selection approaches.
Our RandLA-Net can process 1 million points in a single pass up to 200x faster than existing approaches.
arXiv Detail & Related papers (2021-07-06T05:08:34Z) - Clustering of Big Data with Mixed Features [3.3504365823045044]
We develop a new clustering algorithm for large data of mixed type.
The algorithm is capable of detecting outliers and clusters of relatively lower density values.
We present experimental results to verify that our algorithm works well in practice.
arXiv Detail & Related papers (2020-11-11T19:54:38Z) - Ball k-means [53.89505717006118]
The Ball k-means algorithm uses a ball to describe a cluster, focusing on reducing the point-centroid distance computation.
The fast speed, no extra parameters and simple design of the Ball k-means make it an all-around replacement of the naive k-means algorithm.
arXiv Detail & Related papers (2020-05-02T10:39:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.