DBGSA: A Novel Data Adaptive Bregman Clustering Algorithm
- URL: http://arxiv.org/abs/2307.14375v1
- Date: Tue, 25 Jul 2023 16:37:09 GMT
- Title: DBGSA: A Novel Data Adaptive Bregman Clustering Algorithm
- Authors: Ying Xiao, Hou-biao Li, Yu-pu Zhang
- Abstract summary: We present a clustering algorithm that is highly sensitive to the initial selection and robustness of datasets.
Extensive experiments are conducted on four simulated datasets six real datasets.
Results demonstrate that our algorithm improves the accuracy of various algorithms by an average of 63.8%.
- Score: 2.0232038310495435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the development of Big data technology, data analysis has become
increasingly important. Traditional clustering algorithms such as K-means are
highly sensitive to the initial centroid selection and perform poorly on
non-convex datasets. In this paper, we address these problems by proposing a
data-driven Bregman divergence parameter optimization clustering algorithm
(DBGSA), which combines the Universal Gravitational Algorithm to bring similar
points closer in the dataset. We construct a gravitational coefficient equation
with a special property that gradually reduces the influence factor as the
iteration progresses. Furthermore, we introduce the Bregman divergence
generalized power mean information loss minimization to identify cluster
centers and build a hyperparameter identification optimization model, which
effectively solves the problems of manual adjustment and uncertainty in the
improved dataset. Extensive experiments are conducted on four simulated
datasets and six real datasets. The results demonstrate that DBGSA
significantly improves the accuracy of various clustering algorithms by an
average of 63.8\% compared to other similar approaches like enhanced clustering
algorithms and improved datasets. Additionally, a three-dimensional grid search
was established to compare the effects of different parameter values within
threshold conditions, and it was discovered the parameter set provided by our
model is optimal. This finding provides strong evidence of the high accuracy
and robustness of the algorithm.
Related papers
- Boosting K-means for Big Data by Fusing Data Streaming with Global Optimization [0.3069335774032178]
K-means clustering is a cornerstone of data mining, but its efficiency deteriorates when confronted with massive datasets.
We propose a novel algorithm that leverages the Variable Neighborhood Search (VNS) metaheuristic to optimize K-means clustering for big data.
arXiv Detail & Related papers (2024-10-18T15:43:34Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - Recovering Linear Causal Models with Latent Variables via Cholesky
Factorization of Covariance Matrix [21.698480201955213]
We propose a DAG structure recovering algorithm, which is based on the Cholesky factorization of the covariance matrix of the observed data.
On synthetic and real-world datasets, the algorithm is significantly faster than previous methods and achieves the state-of-the-art performance.
arXiv Detail & Related papers (2023-11-01T17:27:49Z) - Towards High-Performance Exploratory Data Analysis (EDA) Via Stable
Equilibrium Point [5.825190876052149]
We introduce a stable equilibrium point (SEP) - based framework for improving the efficiency and solution quality of EDA.
A very unique property of the proposed method is that the SEPs will directly encode the clustering properties of data sets.
arXiv Detail & Related papers (2023-06-07T13:31:57Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Influence of Swarm Intelligence in Data Clustering Mechanisms [0.0]
Nature inspired Swarm based algorithms are used for data clustering to cope with larger datasets with lack and inconsistency of data.
This paper reviews the performances of these new approaches and compares which is best for certain problematic situation.
arXiv Detail & Related papers (2023-05-07T08:40:50Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Riemannian classification of EEG signals with missing values [67.90148548467762]
This paper proposes two strategies to handle missing data for the classification of electroencephalograms.
The first approach estimates the covariance from imputed data with the $k$-nearest neighbors algorithm; the second relies on the observed data by leveraging the observed-data likelihood within an expectation-maximization algorithm.
As results show, the proposed strategies perform better than the classification based on observed data and allow to keep a high accuracy even when the missing data ratio increases.
arXiv Detail & Related papers (2021-10-19T14:24:50Z) - Understanding Overparameterization in Generative Adversarial Networks [56.57403335510056]
Generative Adversarial Networks (GANs) are used to train non- concave mini-max optimization problems.
A theory has shown the importance of the gradient descent (GD) to globally optimal solutions.
We show that in an overized GAN with a $1$-layer neural network generator and a linear discriminator, the GDA converges to a global saddle point of the underlying non- concave min-max problem.
arXiv Detail & Related papers (2021-04-12T16:23:37Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Automated Clustering of High-dimensional Data with a Feature Weighted
Mean Shift Algorithm [16.0817847880416]
Mean shift is a simple interactive procedure that shifts data points towards the mode which denotes the highest density of data points in the region.
We propose a simple yet elegant feature-weighted variant of mean shift to efficiently learn the feature importance.
The resulting algorithm not only outperforms the conventional mean shift clustering procedure but also preserves its computational simplicity.
arXiv Detail & Related papers (2020-12-20T14:00:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.