A Hybrid Algorithm Based Robust Big Data Clustering for Solving
Unhealthy Initialization, Dynamic Centroid Selection and Empty clustering
Problems with Analysis
- URL: http://arxiv.org/abs/2002.09380v1
- Date: Fri, 21 Feb 2020 16:09:19 GMT
- Title: A Hybrid Algorithm Based Robust Big Data Clustering for Solving
Unhealthy Initialization, Dynamic Centroid Selection and Empty clustering
Problems with Analysis
- Authors: Y. A. Joarder (1) and Mosabbir Ahmed (2) ((1,2) Department of Computer
Science and Engineering, World University of Bangladesh (WUB), Dhaka,
Bangladesh)
- Abstract summary: Clustering algorithms have developed as a powerful learning tool that can analyze the volume of data that produced by modern applications.
Our proposed algorithm EG K-MEANS : Extended Generation K-MEANS solves mainly three issues of K-MEANS.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Big Data is a massive volume of both structured and unstructured data that is
too large and it also difficult to process using traditional techniques.
Clustering algorithms have developed as a powerful learning tool that can
exactly analyze the volume of data that produced by modern applications.
Clustering in data mining is the grouping of a particular set of objects based
on their characteristics. The main aim of clustering is to classified data into
clusters such that objects are grouped in the same clusters when they are
corresponding according to similarities and features mainly. Till now, K-MEANS
is the best utilized calculation connected in a wide scope of zones to
recognize gatherings where cluster separations are a lot than between gathering
separations. Our developed algorithm works with K-MEANS for high quality
clustering during clustering from big data. Our proposed algorithm EG K-MEANS :
Extended Generation K-MEANS solves mainly three issues of K-MEANS: unhealthy
initialization, dynamic centroid selection and empty clustering. It ensures the
best way of preventing unhealthy initialization, dynamic centroid selection and
empty clustering problems for getting high quality clustering.
Related papers
- SHADE: Deep Density-based Clustering [13.629470968274]
SHADE is the first deep clustering algorithm that incorporates density-connectivity into its loss function.
It supports high-dimensional and large data sets with the expressive power of a deep autoencoder.
It outperforms existing methods in clustering quality, especially on data that contain non-Gaussian clusters.
arXiv Detail & Related papers (2024-10-08T18:03:35Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - A3S: A General Active Clustering Method with Pairwise Constraints [66.74627463101837]
A3S features strategic active clustering adjustment on the initial cluster result, which is obtained by an adaptive clustering algorithm.
In extensive experiments across diverse real-world datasets, A3S achieves desired results with significantly fewer human queries.
arXiv Detail & Related papers (2024-07-14T13:37:03Z) - Fuzzy K-Means Clustering without Cluster Centroids [21.256564324236333]
Fuzzy K-Means clustering is a critical technique in unsupervised data analysis.
This paper proposes a novel Fuzzy textitK-Means clustering algorithm that entirely eliminates the reliance on cluster centroids.
arXiv Detail & Related papers (2024-04-07T12:25:03Z) - Hard Regularization to Prevent Deep Online Clustering Collapse without
Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed.
While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster.
We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z) - Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining.
Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks.
Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z) - Differentially-Private Clustering of Easy Instances [67.04951703461657]
In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points.
We provide implementable differentially private clustering algorithms that provide utility when the data is "easy"
We propose a framework that allows us to apply non-private clustering algorithms to the easy instances and privately combine the results.
arXiv Detail & Related papers (2021-12-29T08:13:56Z) - Very Compact Clusters with Structural Regularization via Similarity and
Connectivity [3.779514860341336]
We propose an end-to-end deep clustering algorithm, i.e., Very Compact Clusters (VCC) for the general datasets.
Our proposed approach achieves better clustering performance over most of the state-of-the-art clustering methods.
arXiv Detail & Related papers (2021-06-09T23:22:03Z) - A Deep Learning Object Detection Method for an Efficient Clusters
Initialization [6.365889364810239]
Clustering has been used in numerous applications such as banking customers profiling, document retrieval, image segmentation, and e-commerce recommendation engines.
Existing clustering techniques present significant limitations, from which is the dependability of their stability on the initialization parameters.
This paper proposes a solution that can provide near-optimal clustering parameters with low computational and resources overhead.
arXiv Detail & Related papers (2021-04-28T08:34:25Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Probabilistic Partitive Partitioning (PPP) [0.0]
Clustering algorithms, in general, face two common problems.
They converge to different settings with different initial conditions.
The number of clusters has to be arbitrarily decided beforehand.
arXiv Detail & Related papers (2020-03-09T19:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.