Efficient Dynamic Clustering: Capturing Patterns fromHistorical Cluster
Evolution
- URL: http://arxiv.org/abs/2203.00812v1
- Date: Wed, 2 Mar 2022 01:10:43 GMT
- Title: Efficient Dynamic Clustering: Capturing Patterns fromHistorical Cluster
Evolution
- Authors: Binbin Gu, Saeed Kargar, Faisal Nawab
- Abstract summary: Clustering is important for many tasks such as anomaly detection, database sharding, record linkage, and others.
Some clustering methods are taken as batch algorithms that incur a high overhead as they cluster all the objects in the database from scratch.
Running batch algorithms is infeasible in such scenarios as it would incur a significant overhead if performed continuously.
- Score: 8.220295070012977
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Clustering aims to group unlabeled objects based on similarity inherent among
them into clusters. It is important for many tasks such as anomaly detection,
database sharding, record linkage, and others. Some clustering methods are
taken as batch algorithms that incur a high overhead as they cluster all the
objects in the database from scratch or assume an incremental workload. In
practice, database objects are updated, added, and removed from databases
continuously which makes previous results stale. Running batch algorithms is
infeasible in such scenarios as it would incur a significant overhead if
performed continuously. This is particularly the case for high-velocity
scenarios such as ones in Internet of Things applications. In this paper, we
tackle the problem of clustering in high-velocity dynamic scenarios, where the
objects are continuously updated, inserted, and deleted. Specifically, we
propose a generally dynamic approach to clustering that utilizes previous
clustering results. Our system, DynamicC, uses a machine learning model that is
augmented with an existing batch algorithm. The DynamicC model trains by
observing the clustering decisions made by the batch algorithm. After training,
the DynamicC model is usedin cooperation with the batch algorithm to achieve
both accurate and fast clustering decisions. The experimental results on four
real-world and one synthetic datasets show that our approach has a better
performance compared to the state-of-the-art method while achieving similarly
accurate clustering results to the baseline batch algorithm.
Related papers
- GBCT: An Efficient and Adaptive Granular-Ball Clustering Algorithm for Complex Data [49.56145012222276]
We propose a new clustering algorithm called granular-ball clustering (GBCT) via granular-ball computing.
GBCT forms clusters according to the relationship between granular-balls, instead of the traditional point relationship.
As granular-balls can fit various complex data, GBCT performs much better in non-spherical data sets than other traditional clustering methods.
arXiv Detail & Related papers (2024-10-17T07:32:05Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Hard Regularization to Prevent Deep Online Clustering Collapse without
Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed.
While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster.
We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z) - K-ARMA Models for Clustering Time Series Data [4.345882429229813]
We present an approach to clustering time series data using a model-based generalization of the K-Means algorithm.
We show how the clustering algorithm can be made robust to outliers using a least-absolute deviations criteria.
We perform experiments on real data which show that our method is competitive with other existing methods for similar time series clustering tasks.
arXiv Detail & Related papers (2022-06-30T18:16:11Z) - KnAC: an approach for enhancing cluster analysis with background
knowledge and explanations [0.20999222360659603]
We present Knowledge Augmented Clustering (KnAC), which main goal is to confront expert-based labelling with automated clustering.
KnAC can serve as an augmentation of an arbitrary clustering algorithm, making the approach robust and model-agnostic.
arXiv Detail & Related papers (2021-12-16T10:13:47Z) - Meta Clustering Learning for Large-scale Unsupervised Person
Re-identification [124.54749810371986]
We propose a "small data for big task" paradigm dubbed Meta Clustering Learning (MCL)
MCL only pseudo-labels a subset of the entire unlabeled data via clustering to save computing for the first-phase training.
Our method significantly saves computational cost while achieving a comparable or even better performance compared to prior works.
arXiv Detail & Related papers (2021-11-19T04:10:18Z) - A Novel Incremental Clustering Technique with Concept Drift Detection [2.790947019327459]
Traditional static clustering algorithms are not suitable for dynamic datasets.
We propose an efficient incremental clustering algorithm called UIClust.
We evaluate the performance of UIClust by comparing it with a recently published, high-quality incremental clustering algorithm.
arXiv Detail & Related papers (2020-03-30T05:20:35Z) - Autoencoder-based time series clustering with energy applications [0.0]
Time series clustering is a challenging task due to the specific nature of the data.
In this paper we investigate the combination of a convolutional autoencoder and a k-medoids algorithm to perfom time series clustering.
arXiv Detail & Related papers (2020-02-10T10:04:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.