Dynamic data summarization for hierarchical spatial clustering
- URL: http://arxiv.org/abs/2412.07789v1
- Date: Tue, 26 Nov 2024 06:36:43 GMT
- Title: Dynamic data summarization for hierarchical spatial clustering
- Authors: Kayumov Abduaziz, Min Sik Kim, Ji Sun Shin,
- Abstract summary: HDBSCAN finds meaningful patterns in spatial data by considering density and spatial proximity.<n>We present an exact algorithm that maintains density information and updates the clustering hierarchy of HDBSCAN during point insertions and deletions.
- Score: 2.8470354623829577
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) finds meaningful patterns in spatial data by considering density and spatial proximity. As the clustering algorithm is inherently designed for static applications, so have recent studies focused on accelerating the algorithm for static applications using approximate or parallel methods. However, much less attention has been given to dynamic environments, where even a single point insertion or deletion can require recomputing the clustering hierarchy from scratch due to the need of maintaining the minimum spanning tree (MST) over a complete graph. This paper addresses the challenge of enhancing the clustering algorithm for dynamic data. We present an exact algorithm that maintains density information and updates the clustering hierarchy of HDBSCAN during point insertions and deletions. Considering the hardness of adapting the exact algorithm to dynamic data involving modern workloads, we propose an online-offline framework. The online component efficiently summarizes dynamic data using a tree structure, called Bubble-tree, while the offline step performs the static clustering. Experimental results demonstrate that the data summarization adapts well to fully dynamic environments, providing compression quality on par with existing techniques while significantly improving runtime performance of the clustering algorithm in dynamic data workloads.
Related papers
- TNStream: Applying Tightest Neighbors to Micro-Clusters to Define Multi-Density Clusters in Streaming Data [1.2016321065590192]
This paper proposes a clustering algorithm based on the novel concept of Tightest Neighbors and introduces a data stream clustering theory based on the Skeleton Set.
Based on these theories, this paper develops a new method, TNStream, a fully online algorithm.
Experimental results demonstrate its effectiveness in improving clustering quality for multi-density data and validate the proposed data stream clustering theory.
arXiv Detail & Related papers (2025-05-01T07:15:20Z) - Line Space Clustering (LSC): Feature-Based Clustering using K-medians and Dynamic Time Warping for Versatility [0.0]
Line Space Clustering (LSC) is a representation that transforms data points into lines in a newly defined feature space.
LSC employs a combined distance metric that uses Euclidean and Dynamic Time Warping (DTW) distances, weighted by a parameter alpha
experiments demonstrate the efficacy of LSC on synthetic and real-world datasets.
arXiv Detail & Related papers (2025-03-20T01:27:10Z) - Dynamic DBSCAN with Euler Tour Sequences [13.448072954894219]
We propose a fast and dynamic algorithm for Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
Traditional DBSCAN algorithms become computationally expensive when applied to dynamic datasets.
Our algorithm leverages the Euler Tour Trees data structure, enabling dynamic clustering updates without the need to reprocess the entire dataset.
arXiv Detail & Related papers (2025-03-11T10:08:39Z) - Clustering in Dynamic Environments: A Framework for Benchmark Dataset Generation With Heterogeneous Changes [11.56518009058007]
Clustering in dynamic environments is of increasing importance, with broad applications ranging from real-time data analysis and online unsupervised learning to dynamic facility location problems.
meta-heuristics have shown promising effectiveness in static clustering tasks, their application for tracking optimal clustering solutions or robust clustering over time in dynamic environments remains largely underexplored.
This is partly due to a lack of dynamic datasets with diverse, controllable, and realistic dynamic characteristics, hindering systematic performance evaluations of clustering algorithms in various dynamic scenarios.
This deficiency leads to a gap in our understanding and capability to effectively design algorithms for clustering in dynamic environments.
arXiv Detail & Related papers (2024-02-24T05:49:27Z) - FLASC: A Flare-Sensitive Clustering Algorithm [0.0]
We present FLASC, an algorithm that detects branches within clusters to identify subpopulations.
Two variants of the algorithm are presented, which trade computational cost for noise robustness.
We show that both variants scale similarly to HDBSCAN* in terms of computational cost and provide stable outputs.
arXiv Detail & Related papers (2023-11-27T14:55:16Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Deep Temporal Graph Clustering [77.02070768950145]
We propose a general framework for deep Temporal Graph Clustering (GC)
GC introduces deep clustering techniques to suit the interaction sequence-based batch-processing pattern of temporal graphs.
Our framework can effectively improve the performance of existing temporal graph learning methods.
arXiv Detail & Related papers (2023-05-18T06:17:50Z) - Hard Regularization to Prevent Deep Online Clustering Collapse without
Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed.
While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster.
We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z) - A Dynamical Systems Algorithm for Clustering in Hyperspectral Imagery [0.18374319565577152]
We present a new dynamical systems algorithm for clustering in hyperspectral images.
The main idea of the algorithm is that data points are pushed' in the direction of increasing density and groups of pixels that end up in the same dense regions belong to the same class.
We evaluate the algorithm on the Urban scene comparing performance against the k-means algorithm using pre-identified classes of materials as ground truth.
arXiv Detail & Related papers (2022-07-21T17:31:57Z) - Efficient Dynamic Clustering: Capturing Patterns fromHistorical Cluster
Evolution [8.220295070012977]
Clustering is important for many tasks such as anomaly detection, database sharding, record linkage, and others.
Some clustering methods are taken as batch algorithms that incur a high overhead as they cluster all the objects in the database from scratch.
Running batch algorithms is infeasible in such scenarios as it would incur a significant overhead if performed continuously.
arXiv Detail & Related papers (2022-03-02T01:10:43Z) - Very Compact Clusters with Structural Regularization via Similarity and
Connectivity [3.779514860341336]
We propose an end-to-end deep clustering algorithm, i.e., Very Compact Clusters (VCC) for the general datasets.
Our proposed approach achieves better clustering performance over most of the state-of-the-art clustering methods.
arXiv Detail & Related papers (2021-06-09T23:22:03Z) - Spatial-Spectral Clustering with Anchor Graph for Hyperspectral Image [88.60285937702304]
This paper proposes a novel unsupervised approach called spatial-spectral clustering with anchor graph (SSCAG) for HSI data clustering.
The proposed SSCAG is competitive against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-24T08:09:27Z) - ClusterVO: Clustering Moving Instances and Estimating Visual Odometry
for Self and Surroundings [54.33327082243022]
ClusterVO is a stereo Visual Odometry which simultaneously clusters and estimates the motion of both ego and surrounding rigid clusters/objects.
Unlike previous solutions relying on batch input or imposing priors on scene structure or dynamic object models, ClusterVO is online, general and thus can be used in various scenarios including indoor scene understanding and autonomous driving.
arXiv Detail & Related papers (2020-03-29T09:06:28Z) - New advances in enumerative biclustering algorithms with online
partitioning [80.22629846165306]
This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets.
The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime.
arXiv Detail & Related papers (2020-03-07T14:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.