IPD:An Incremental Prototype based DBSCAN for large-scale data with
cluster representatives
- URL: http://arxiv.org/abs/2202.07870v2
- Date: Wed, 11 Oct 2023 03:11:14 GMT
- Title: IPD:An Incremental Prototype based DBSCAN for large-scale data with
cluster representatives
- Authors: Jayasree Saha, Jayanta Mukherjee
- Abstract summary: We propose an Incremental Prototype-based DBSCAN (IPD) algorithm which is designed to identify arbitrary-shaped clusters for large-scale data.
In this paper, we propose an Incremental Prototype-based DBSCAN (IPD) algorithm which is designed to identify arbitrary-shaped clusters for large-scale data.
- Score: 2.864550757598006
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: DBSCAN is a fundamental density-based clustering technique that identifies
any arbitrary shape of the clusters. However, it becomes infeasible while
handling big data. On the other hand, centroid-based clustering is important
for detecting patterns in a dataset since unprocessed data points can be
labeled to their nearest centroid. However, it can not detect non-spherical
clusters. For a large data, it is not feasible to store and compute labels of
every samples. These can be done as and when the information is required. The
purpose can be accomplished when clustering act as a tool to identify cluster
representatives and query is served by assigning cluster labels of nearest
representative. In this paper, we propose an Incremental Prototype-based DBSCAN
(IPD) algorithm which is designed to identify arbitrary-shaped clusters for
large-scale data. Additionally, it chooses a set of representatives for each
cluster.
Related papers
- Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Interpretable Deep Clustering for Tabular Data [7.972599673048582]
Clustering is a fundamental learning task widely used in data analysis.
We propose a new deep-learning framework that predicts interpretable cluster assignments at the instance and cluster levels.
We show that the proposed method can reliably predict cluster assignments in biological, text, image, and physics datasets.
arXiv Detail & Related papers (2023-06-07T21:08:09Z) - Hard Regularization to Prevent Deep Online Clustering Collapse without
Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed.
While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster.
We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z) - DRBM-ClustNet: A Deep Restricted Boltzmann-Kohonen Architecture for Data
Clustering [0.0]
A Bayesian Deep Restricted Boltzmann-Kohonen architecture for data clustering termed as DRBM-ClustNet is proposed.
The processing of unlabeled data is done in three stages for efficient clustering of the non-linearly separable datasets.
The framework is evaluated based on clustering accuracy and ranked against other state-of-the-art clustering methods.
arXiv Detail & Related papers (2022-05-13T15:12:18Z) - Implicit Sample Extension for Unsupervised Person Re-Identification [97.46045935897608]
Clustering sometimes mixes different true identities together or splits the same identity into two or more sub clusters.
We propose an Implicit Sample Extension (OurWholeMethod) method to generate what we call support samples around the cluster boundaries.
Experiments demonstrate that the proposed method is effective and achieves state-of-the-art performance for unsupervised person Re-ID.
arXiv Detail & Related papers (2022-04-14T11:41:48Z) - Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC)
In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning.
For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z) - Clustering Plotted Data by Image Segmentation [12.443102864446223]
Clustering algorithms are one of the main analytical methods to detect patterns in unlabeled data.
In this paper, we present a wholly different way of clustering points in 2-dimensional space, inspired by how humans cluster data.
Our approach, Visual Clustering, has several advantages over traditional clustering algorithms.
arXiv Detail & Related papers (2021-10-06T06:19:30Z) - Cluster Representatives Selection in Non-Metric Spaces for Nearest
Prototype Classification [4.176752121302988]
In this paper, we present CRS, a novel method for selecting a small yet representative subset of objects as a cluster prototype.
Memory and computationally efficient selection of representatives is enabled by leveraging the similarity graph representation of each cluster created by the NN-Descent algorithm.
CRS can be used in an arbitrary metric or non-metric space because of the graph-based approach, which requires only a pairwise similarity measure.
arXiv Detail & Related papers (2021-07-03T04:51:07Z) - Swarm Intelligence for Self-Organized Clustering [6.85316573653194]
A swarm system called Databionic swarm (DBS) is introduced which is able to adapt itself to structures of high-dimensional data.
By exploiting the interrelations of swarm intelligence, self-organization and emergence, DBS serves as an alternative approach to the optimization of a global objective function in the task of clustering.
arXiv Detail & Related papers (2021-06-10T06:21:48Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.