Density based Spatial Clustering of Lines via Probabilistic Generation of Neighbourhood
- URL: http://arxiv.org/abs/2410.02290v1
- Date: Thu, 3 Oct 2024 08:17:11 GMT
- Title: Density based Spatial Clustering of Lines via Probabilistic Generation of Neighbourhood
- Authors: Akanksha Das, Malay Bhattacharyya,
- Abstract summary: In this paper, we design a clustering algorithm that generates a customised neighbourhood for a line of a fixed volume.
This algorithm is not sensitive to the outliers and can effectively identify the noise in the data using a cardinality parameter.
One of the pivotal applications of this algorithm is clustering data points in $mathbbRn$ with missing entries.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Density based spatial clustering of points in $\mathbb{R}^n$ has a myriad of applications in a variety of industries. We generalise this problem to the density based clustering of lines in high-dimensional spaces, keeping in mind there exists no valid distance measure that follows the triangle inequality for lines. In this paper, we design a clustering algorithm that generates a customised neighbourhood for a line of a fixed volume (given as a parameter), based on an optional parameter as a continuous probability density function. This algorithm is not sensitive to the outliers and can effectively identify the noise in the data using a cardinality parameter. One of the pivotal applications of this algorithm is clustering data points in $\mathbb{R}^n$ with missing entries, while utilising the domain knowledge of the respective data. In particular, the proposed algorithm is able to cluster $n$-dimensional data points that contain at least $(n-1)$-dimensional information. We illustrate the neighbourhoods for the standard probability distributions with continuous probability density functions and demonstrate the effectiveness of our algorithm on various synthetic and real-world datasets (e.g., rail and road networks). The experimental results also highlight its application in clustering incomplete data.
Related papers
- Adaptive $k$-nearest neighbor classifier based on the local estimation of the shape operator [49.87315310656657]
We introduce a new adaptive $k$-nearest neighbours ($kK$-NN) algorithm that explores the local curvature at a sample to adaptively defining the neighborhood size.
Results on many real-world datasets indicate that the new $kK$-NN algorithm yields superior balanced accuracy compared to the established $k$-NN method.
arXiv Detail & Related papers (2024-09-08T13:08:45Z) - Faithful Density-Peaks Clustering via Matrix Computations on MPI Parallelization System [7.594123537718585]
Density peaks clustering (DP) has the ability of detecting clusters of arbitrary shape and clustering non-Euclidean space data.
We present a faithful and parallel DP method that makes use of two types of vector-like distance matrices and an inverse leading-node-finding policy.
Our method is capable of clustering non-Euclidean data such as in community detection, while outperforming the state-of-the-art counterpart methods in accuracy when clustering large Euclidean data.
arXiv Detail & Related papers (2024-06-18T06:05:45Z) - DECWA : Density-Based Clustering using Wasserstein Distance [1.4132765964347058]
We propose a new clustering algorithm based on spatial density and probabilistic approach.
We show that our approach outperforms other state-of-the-art density-based clustering methods on a wide variety of datasets.
arXiv Detail & Related papers (2023-10-25T11:10:08Z) - PaVa: a novel Path-based Valley-seeking clustering algorithm [13.264374632165776]
We propose a novel Path-based Valley-seeking clustering algorithm for arbitrarily shaped clusters.
Three vital techniques are used in this algorithm.
The results indicate that the Path-based Valley-seeking algorithm is accurate and efficient.
arXiv Detail & Related papers (2023-06-13T02:29:34Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Linearized Wasserstein dimensionality reduction with approximation
guarantees [65.16758672591365]
LOT Wassmap is a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space.
We show that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size.
We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
arXiv Detail & Related papers (2023-02-14T22:12:16Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - A Dynamical Systems Algorithm for Clustering in Hyperspectral Imagery [0.18374319565577152]
We present a new dynamical systems algorithm for clustering in hyperspectral images.
The main idea of the algorithm is that data points are pushed' in the direction of increasing density and groups of pixels that end up in the same dense regions belong to the same class.
We evaluate the algorithm on the Urban scene comparing performance against the k-means algorithm using pre-identified classes of materials as ground truth.
arXiv Detail & Related papers (2022-07-21T17:31:57Z) - Density-Based Clustering with Kernel Diffusion [59.4179549482505]
A naive density corresponding to the indicator function of a unit $d$-dimensional Euclidean ball is commonly used in density-based clustering algorithms.
We propose a new kernel diffusion density function, which is adaptive to data of varying local distributional characteristics and smoothness.
arXiv Detail & Related papers (2021-10-11T09:00:33Z) - Spatially relaxed inference on high-dimensional linear models [48.989769153211995]
We study the properties of ensembled clustered inference algorithms which combine spatially constrained clustering, statistical inference, and ensembling to aggregate several clustered inference solutions.
We show that ensembled clustered inference algorithms control the $delta$-FWER under standard assumptions for $delta$ equal to the largest cluster diameter.
arXiv Detail & Related papers (2021-06-04T16:37:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.