A Novel Incremental Clustering Technique with Concept Drift Detection
- URL: http://arxiv.org/abs/2003.13225v1
- Date: Mon, 30 Mar 2020 05:20:35 GMT
- Title: A Novel Incremental Clustering Technique with Concept Drift Detection
- Authors: Mitchell D. Woodbright, Md Anisur Rahman, Md Zahidul Islam
- Abstract summary: Traditional static clustering algorithms are not suitable for dynamic datasets.
We propose an efficient incremental clustering algorithm called UIClust.
We evaluate the performance of UIClust by comparing it with a recently published, high-quality incremental clustering algorithm.
- Score: 2.790947019327459
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data are being collected from various aspects of life. These data can often
arrive in chunks/batches. Traditional static clustering algorithms are not
suitable for dynamic datasets, i.e., when data arrive in streams of
chunks/batches. If we apply a conventional clustering technique over the
combined dataset, then every time a new batch of data comes, the process can be
slow and wasteful. Moreover, it can be challenging to store the combined
dataset in memory due to its ever-increasing size. As a result, various
incremental clustering techniques have been proposed. These techniques need to
efficiently update the current clustering result whenever a new batch arrives,
to adapt the current clustering result/solution with the latest data. These
techniques also need the ability to detect concept drifts when the clustering
pattern of a new batch is significantly different from older batches.
Sometimes, clustering patterns may drift temporarily in a single batch while
the next batches do not exhibit the drift. Therefore, incremental clustering
techniques need the ability to detect a temporary drift and sustained drift. In
this paper, we propose an efficient incremental clustering algorithm called
UIClust. It is designed to cluster streams of data chunks, even when there are
temporary or sustained concept drifts. We evaluate the performance of UIClust
by comparing it with a recently published, high-quality incremental clustering
algorithm. We use real and synthetic datasets. We compare the results by using
well-known clustering evaluation criteria: entropy, sum of squared errors
(SSE), and execution time. Our results show that UIClust outperforms the
existing technique in all our experiments.
Related papers
- GBCT: An Efficient and Adaptive Granular-Ball Clustering Algorithm for Complex Data [49.56145012222276]
We propose a new clustering algorithm called granular-ball clustering (GBCT) via granular-ball computing.
GBCT forms clusters according to the relationship between granular-balls, instead of the traditional point relationship.
As granular-balls can fit various complex data, GBCT performs much better in non-spherical data sets than other traditional clustering methods.
arXiv Detail & Related papers (2024-10-17T07:32:05Z) - Dying Clusters Is All You Need -- Deep Clustering With an Unknown Number of Clusters [5.507296054825372]
Finding meaningful groups in high-dimensional data is an important challenge in data mining.
Deep clustering methods have achieved remarkable results in these tasks.
Most of these methods require the user to specify the number of clusters in advance.
This is a major limitation since the number of clusters is typically unknown if labeled data is unavailable.
Most of these approaches estimate the number of clusters separated from the clustering process.
arXiv Detail & Related papers (2024-10-12T11:04:10Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Hard Regularization to Prevent Deep Online Clustering Collapse without
Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed.
While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster.
We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z) - Efficient Dynamic Clustering: Capturing Patterns fromHistorical Cluster
Evolution [8.220295070012977]
Clustering is important for many tasks such as anomaly detection, database sharding, record linkage, and others.
Some clustering methods are taken as batch algorithms that incur a high overhead as they cluster all the objects in the database from scratch.
Running batch algorithms is infeasible in such scenarios as it would incur a significant overhead if performed continuously.
arXiv Detail & Related papers (2022-03-02T01:10:43Z) - Improved Multi-objective Data Stream Clustering with Time and Memory
Optimization [0.0]
This paper introduces a new data stream clustering method (IMOC-Stream)
It uses two different objective functions to capture different aspects of the data.
The experiments show the ability of our method to partition the data stream in arbitrarily shaped, compact, and well-separated clusters.
arXiv Detail & Related papers (2022-01-13T17:05:56Z) - Cluster-and-Conquer: A Framework For Time-Series Forecasting [94.63501563413725]
We propose a three-stage framework for forecasting high-dimensional time-series data.
Our framework is highly general, allowing for any time-series forecasting and clustering method to be used in each step.
When instantiated with simple linear autoregressive models, we are able to achieve state-of-the-art results on several benchmark datasets.
arXiv Detail & Related papers (2021-10-26T20:41:19Z) - Variational Auto Encoder Gradient Clustering [0.0]
Clustering using deep neural network models have been extensively studied in recent years.
This article investigates how probability function gradient ascent can be used to process data in order to achieve better clustering.
We propose a simple yet effective method for investigating suitable number of clusters for data, based on the DBSCAN clustering algorithm.
arXiv Detail & Related papers (2021-05-11T08:00:36Z) - Autoencoder-based time series clustering with energy applications [0.0]
Time series clustering is a challenging task due to the specific nature of the data.
In this paper we investigate the combination of a convolutional autoencoder and a k-medoids algorithm to perfom time series clustering.
arXiv Detail & Related papers (2020-02-10T10:04:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.