Related papers: A Hybrid Algorithm Based Robust Big Data Clustering for Solving Unhealthy Initialization, Dynamic Centroid Selection and Empty clustering Problems with Analysis

A Hybrid Algorithm Based Robust Big Data Clustering for Solving Unhealthy Initialization, Dynamic Centroid Selection and Empty clustering Problems with Analysis

URL: http://arxiv.org/abs/2002.09380v1
Date: Fri, 21 Feb 2020 16:09:19 GMT
Title: A Hybrid Algorithm Based Robust Big Data Clustering for Solving Unhealthy Initialization, Dynamic Centroid Selection and Empty clustering Problems with Analysis
Authors: Y. A. Joarder (1) and Mosabbir Ahmed (2) ((1,2) Department of Computer Science and Engineering, World University of Bangladesh (WUB), Dhaka, Bangladesh)
Abstract summary: Clustering algorithms have developed as a powerful learning tool that can analyze the volume of data that produced by modern applications. Our proposed algorithm EG K-MEANS : Extended Generation K-MEANS solves mainly three issues of K-MEANS.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Big Data is a massive volume of both structured and unstructured data that is too large and it also difficult to process using traditional techniques. Clustering algorithms have developed as a powerful learning tool that can exactly analyze the volume of data that produced by modern applications. Clustering in data mining is the grouping of a particular set of objects based on their characteristics. The main aim of clustering is to classified data into clusters such that objects are grouped in the same clusters when they are corresponding according to similarities and features mainly. Till now, K-MEANS is the best utilized calculation connected in a wide scope of zones to recognize gatherings where cluster separations are a lot than between gathering separations. Our developed algorithm works with K-MEANS for high quality clustering during clustering from big data. Our proposed algorithm EG K-MEANS : Extended Generation K-MEANS solves mainly three issues of K-MEANS: unhealthy initialization, dynamic centroid selection and empty clustering. It ensures the best way of preventing unhealthy initialization, dynamic centroid selection and empty clustering problems for getting high quality clustering.

Related papers

How to optimize K-means? [8.206124331448931]
Center-based clustering algorithms (e.g., K-means) are popular for clustering tasks, but they usually struggle to achieve high accuracy on complex datasets. We believe the main reason is that traditional center-based clustering algorithms identify only one clustering center in each cluster. We propose a general optimization method called ECAC, and it can optimize different center-based clustering algorithms.
arXiv Detail & Related papers (2025-03-25T03:37:52Z)
SHADE: Deep Density-based Clustering [13.629470968274]
SHADE is the first deep clustering algorithm that incorporates density-connectivity into its loss function. It supports high-dimensional and large data sets with the expressive power of a deep autoencoder. It outperforms existing methods in clustering quality, especially on data that contain non-Gaussian clusters.
arXiv Detail & Related papers (2024-10-08T18:03:35Z)
Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks. We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z)
A3S: A General Active Clustering Method with Pairwise Constraints [66.74627463101837]
A3S features strategic active clustering adjustment on the initial cluster result, which is obtained by an adaptive clustering algorithm. In extensive experiments across diverse real-world datasets, A3S achieves desired results with significantly fewer human queries.
arXiv Detail & Related papers (2024-07-14T13:37:03Z)
Fuzzy K-Means Clustering without Cluster Centroids [21.256564324236333]
Fuzzy K-Means clustering is a critical technique in unsupervised data analysis. This paper proposes a novel Fuzzy textitK-Means clustering algorithm that entirely eliminates the reliance on cluster centroids.
arXiv Detail & Related papers (2024-04-07T12:25:03Z)
Hard Regularization to Prevent Deep Online Clustering Collapse without Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed. While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster. We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z)
Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining. Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks. Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z)
Differentially-Private Clustering of Easy Instances [67.04951703461657]
In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points. We provide implementable differentially private clustering algorithms that provide utility when the data is "easy" We propose a framework that allows us to apply non-private clustering algorithms to the easy instances and privately combine the results.
arXiv Detail & Related papers (2021-12-29T08:13:56Z)
Very Compact Clusters with Structural Regularization via Similarity and Connectivity [3.779514860341336]
We propose an end-to-end deep clustering algorithm, i.e., Very Compact Clusters (VCC) for the general datasets. Our proposed approach achieves better clustering performance over most of the state-of-the-art clustering methods.
arXiv Detail & Related papers (2021-06-09T23:22:03Z)
A Deep Learning Object Detection Method for an Efficient Clusters Initialization [6.365889364810239]
Clustering has been used in numerous applications such as banking customers profiling, document retrieval, image segmentation, and e-commerce recommendation engines. Existing clustering techniques present significant limitations, from which is the dependability of their stability on the initialization parameters. This paper proposes a solution that can provide near-optimal clustering parameters with low computational and resources overhead.
arXiv Detail & Related papers (2021-04-28T08:34:25Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
Probabilistic Partitive Partitioning (PPP) [0.0]
Clustering algorithms, in general, face two common problems. They converge to different settings with different initial conditions. The number of clusters has to be arbitrarily decided beforehand.
arXiv Detail & Related papers (2020-03-09T19:18:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.