Related papers: DBGSA: A Novel Data Adaptive Bregman Clustering Algorithm

DBGSA: A Novel Data Adaptive Bregman Clustering Algorithm

URL: http://arxiv.org/abs/2307.14375v1
Date: Tue, 25 Jul 2023 16:37:09 GMT
Title: DBGSA: A Novel Data Adaptive Bregman Clustering Algorithm
Authors: Ying Xiao, Hou-biao Li, Yu-pu Zhang
Abstract summary: We present a clustering algorithm that is highly sensitive to the initial selection and robustness of datasets. Extensive experiments are conducted on four simulated datasets six real datasets. Results demonstrate that our algorithm improves the accuracy of various algorithms by an average of 63.8%.
Score: 2.0232038310495435
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the development of Big data technology, data analysis has become increasingly important. Traditional clustering algorithms such as K-means are highly sensitive to the initial centroid selection and perform poorly on non-convex datasets. In this paper, we address these problems by proposing a data-driven Bregman divergence parameter optimization clustering algorithm (DBGSA), which combines the Universal Gravitational Algorithm to bring similar points closer in the dataset. We construct a gravitational coefficient equation with a special property that gradually reduces the influence factor as the iteration progresses. Furthermore, we introduce the Bregman divergence generalized power mean information loss minimization to identify cluster centers and build a hyperparameter identification optimization model, which effectively solves the problems of manual adjustment and uncertainty in the improved dataset. Extensive experiments are conducted on four simulated datasets and six real datasets. The results demonstrate that DBGSA significantly improves the accuracy of various clustering algorithms by an average of 63.8\% compared to other similar approaches like enhanced clustering algorithms and improved datasets. Additionally, a three-dimensional grid search was established to compare the effects of different parameter values within threshold conditions, and it was discovered the parameter set provided by our model is optimal. This finding provides strong evidence of the high accuracy and robustness of the algorithm.

Related papers

Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning [53.527506374566485]
We propose a novel Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning cluster framework, namely AR-DBSCAN.<n>We show that AR-DBSCAN not only improves clustering accuracy by up to 144.1% and 175.3% in the NMI and ARI metrics, respectively, but also is capable of robustly finding dominant parameters.
arXiv Detail & Related papers (2025-05-07T11:37:23Z)
AdaptiveMDL-GenClust: A Robust Clustering Framework Integrating Normalized Mutual Information and Evolutionary Algorithms [0.0]
We introduce a robust clustering framework that integrates the Minimum Description Length (MDL) principle with a genetic optimization algorithm. The framework begins with an ensemble clustering approach to generate an initial clustering solution, which is refined using MDL-guided evaluation functions and optimized through a genetic algorithm. Experimental results demonstrate that our approach consistently outperforms traditional clustering methods, yielding higher accuracy, improved stability, and reduced bias.
arXiv Detail & Related papers (2024-11-26T20:26:14Z)
K-GBS3FCM -- KNN Graph-Based Safe Semi-Supervised Fuzzy C-Means [0.0]
This paper introduces the KNN graph-based safety-aware semi-supervised fuzzy c-means algorithm (K-GBS3FCM) It dynamically assesses neighborhood relationships between labeled and unlabeled data using the K-Nearest Neighbors (KNN) algorithm. It is proposed a mechanism that adjusts the influence of labeled data on unlabeled ones through regularization parameters and the average safety degree.
arXiv Detail & Related papers (2024-11-22T04:48:58Z)
Boosting K-means for Big Data by Fusing Data Streaming with Global Optimization [0.3069335774032178]
K-means clustering is a cornerstone of data mining, but its efficiency deteriorates when confronted with massive datasets. We propose a novel algorithm that leverages the Variable Neighborhood Search (VNS) metaheuristic to optimize K-means clustering for big data.
arXiv Detail & Related papers (2024-10-18T15:43:34Z)
A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data. We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z)
Recovering Linear Causal Models with Latent Variables via Cholesky Factorization of Covariance Matrix [21.698480201955213]
We propose a DAG structure recovering algorithm, which is based on the Cholesky factorization of the covariance matrix of the observed data. On synthetic and real-world datasets, the algorithm is significantly faster than previous methods and achieves the state-of-the-art performance.
arXiv Detail & Related papers (2023-11-01T17:27:49Z)
Towards High-Performance Exploratory Data Analysis (EDA) Via Stable Equilibrium Point [5.825190876052149]
We introduce a stable equilibrium point (SEP) - based framework for improving the efficiency and solution quality of EDA. A very unique property of the proposed method is that the SEPs will directly encode the clustering properties of data sets.
arXiv Detail & Related papers (2023-06-07T13:31:57Z)
Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation. Specifically, we construct distance matrix between data points by Butterworth filter. To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z)
Influence of Swarm Intelligence in Data Clustering Mechanisms [0.0]
Nature inspired Swarm based algorithms are used for data clustering to cope with larger datasets with lack and inconsistency of data. This paper reviews the performances of these new approaches and compares which is best for certain problematic situation.
arXiv Detail & Related papers (2023-05-07T08:40:50Z)
Learning to Bound Counterfactual Inference in Structural Causal Models from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm. The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources. It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z)
Riemannian classification of EEG signals with missing values [67.90148548467762]
This paper proposes two strategies to handle missing data for the classification of electroencephalograms. The first approach estimates the covariance from imputed data with the $k$-nearest neighbors algorithm; the second relies on the observed data by leveraging the observed-data likelihood within an expectation-maximization algorithm. As results show, the proposed strategies perform better than the classification based on observed data and allow to keep a high accuracy even when the missing data ratio increases.
arXiv Detail & Related papers (2021-10-19T14:24:50Z)
Understanding Overparameterization in Generative Adversarial Networks [56.57403335510056]
Generative Adversarial Networks (GANs) are used to train non- concave mini-max optimization problems. A theory has shown the importance of the gradient descent (GD) to globally optimal solutions. We show that in an overized GAN with a $1$-layer neural network generator and a linear discriminator, the GDA converges to a global saddle point of the underlying non- concave min-max problem.
arXiv Detail & Related papers (2021-04-12T16:23:37Z)
Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization. We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.