A Multi-disciplinary Ensemble Algorithm for Clustering Heterogeneous
Datasets
- URL: http://arxiv.org/abs/2102.08361v1
- Date: Fri, 1 Jan 2021 07:20:50 GMT
- Title: A Multi-disciplinary Ensemble Algorithm for Clustering Heterogeneous
Datasets
- Authors: Bryar A. Hassan, Tarik A. Rashid
- Abstract summary: We propose a new evolutionary clustering algorithm (ECAStar) based on social class ranking and meta-heuristic algorithms.
ECAStar is integrated with recombinational evolutionary operators, Levy flight optimisation, and some statistical techniques.
Experiments are conducted to evaluate the ECAStar against five conventional approaches.
- Score: 0.76146285961466
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Clustering is a commonly used method for exploring and analysing data where
the primary objective is to categorise observations into similar clusters. In
recent decades, several algorithms and methods have been developed for
analysing clustered data. We notice that most of these techniques
deterministically define a cluster based on the value of the attributes,
distance, and density of homogenous and single-featured datasets. However,
these definitions are not successful in adding clear semantic meaning to the
clusters produced. Evolutionary operators and statistical and
multi-disciplinary techniques may help in generating meaningful clusters. Based
on this premise, we propose a new evolutionary clustering algorithm (ECAStar)
based on social class ranking and meta-heuristic algorithms for stochastically
analysing heterogeneous and multiple-featured datasets. The ECAStar is
integrated with recombinational evolutionary operators, Levy flight
optimisation, and some statistical techniques, such as quartiles and
percentiles, as well as the Euclidean distance of the K-means algorithm.
Experiments are conducted to evaluate the ECAStar against five conventional
approaches: K-means (KM), K-meansPlusPlus (KMPlusPlus), expectation
maximisation (EM), learning vector quantisation (LVQ), and the genetic
algorithm for clusteringPlusPlus (GENCLUSTPlusPlus).
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Fuzzy K-Means Clustering without Cluster Centroids [21.256564324236333]
Fuzzy K-Means clustering is a critical technique in unsupervised data analysis.
This paper proposes a novel Fuzzy textitK-Means clustering algorithm that entirely eliminates the reliance on cluster centroids.
arXiv Detail & Related papers (2024-04-07T12:25:03Z) - Superclustering by finding statistically significant separable groups of
optimal gaussian clusters [0.0]
The paper presents the algorithm for clustering a dataset by grouping the optimal, from the point of view of the BIC criterion.
An essential advantage of the algorithm is its ability to predict correct supercluster for new data based on already trained clusterer.
arXiv Detail & Related papers (2023-09-05T23:49:46Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - A One-shot Framework for Distributed Clustered Learning in Heterogeneous
Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments.
One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees.
For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z) - Robust Trimmed k-means [70.88503833248159]
We propose Robust Trimmed k-means (RTKM) that simultaneously identifies outliers and clusters points.
We show RTKM performs competitively with other methods on single membership data with outliers and multi-membership data without outliers.
arXiv Detail & Related papers (2021-08-16T15:49:40Z) - Performance evaluation results of evolutionary clustering algorithm star
for clustering heterogeneous datasets [15.154538450706474]
This article presents the data used to evaluate the performance of evolutionary clustering algorithm star (ECA*)
Two experimental methods are employed to examine the performance of ECA* against five traditional and modern clustering algorithms.
arXiv Detail & Related papers (2021-04-30T08:17:19Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Too Much Information Kills Information: A Clustering Perspective [6.375668163098171]
We propose a simple, but novel approach for variance-based k-clustering tasks, including in which is the widely known k-means clustering.
The proposed approach picks a sampling subset from the given dataset and makes decisions based on the data information in the subset only.
With certain assumptions, the resulting clustering is provably good to estimate the optimum of the variance-based objective with high probability.
arXiv Detail & Related papers (2020-09-16T01:54:26Z) - A semi-supervised sparse K-Means algorithm [3.04585143845864]
An unsupervised sparse clustering method can be employed in order to detect the subgroup of features necessary for clustering.
A semi-supervised method can use the labelled data to create constraints and enhance the clustering solution.
We show that the algorithm maintains the high performance of other semi-supervised algorithms and in addition preserves the ability to identify informative from uninformative features.
arXiv Detail & Related papers (2020-03-16T02:05:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.