Related papers: The Three Ensemble Clustering (3EC) Algorithm for Pattern Discovery in Unsupervised Learning

The Three Ensemble Clustering (3EC) Algorithm for Pattern Discovery in Unsupervised Learning

URL: http://arxiv.org/abs/2107.03729v1
Date: Thu, 8 Jul 2021 10:15:18 GMT
Title: The Three Ensemble Clustering (3EC) Algorithm for Pattern Discovery in Unsupervised Learning
Authors: Kundu, Debasish
Abstract summary: The 'Three Ensemble Clustering 3EC' algorithm classifies unlabeled data into quality clusters as a part of unsupervised learning. Each partitioned cluster is considered to be a new data set and is a candidate to explore the most optimal algorithm. The users can experiment with different sets of stopping criteria and choose the most'sensible group' of quality clusters.
Score: 1.0465883970481493
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents a multiple learner algorithm called the 'Three Ensemble Clustering 3EC' algorithm that classifies unlabeled data into quality clusters as a part of unsupervised learning. It offers the flexibility to explore the context of new clusters formed by an ensemble of algorithms based on internal validation indices. It is worth mentioning that the input data set is considered to be a cluster of clusters. An anomaly can possibly manifest as a cluster as well. Each partitioned cluster is considered to be a new data set and is a candidate to explore the most optimal algorithm and its number of partition splits until a predefined stopping criteria is met. The algorithms independently partition the data set into clusters and the quality of the partitioning is assessed by an ensemble of internal cluster validation indices. The 3EC algorithm presents the validation index scores from a choice of algorithms and its configuration of partitions and it is called the Tau Grid. 3EC chooses the most optimal score. The 3EC algorithm owes its name to the two input ensembles of algorithms and internal validation indices and an output ensemble of final clusters. Quality plays an important role in this clustering approach and it also acts as a stopping criteria from further partitioning. Quality is determined based on the quality of the clusters provided by an algorithm and its optimal number of splits. The 3EC algorithm determines this from the score of the ensemble of validation indices. The user can configure the stopping criteria by providing quality thresholds for the score range of each of the validation indices and the optimal size of the output cluster. The users can experiment with different sets of stopping criteria and choose the most 'sensible group' of quality clusters

Related papers

K*-Means: A Parameter-free Clustering Algorithm [55.20132267309382]
k*-means is a novel clustering algorithm that eliminates the need to set k or any other parameters.<n>It uses the minimum description length principle to automatically determine the optimal number of clusters, k*, by splitting and merging clusters.<n>We prove that k*-means is guaranteed to converge and demonstrate experimentally that it significantly outperforms existing methods in scenarios where k is unknown.
arXiv Detail & Related papers (2025-05-17T08:41:07Z)
From A-to-Z Review of Clustering Validation Indices [4.08908337437878]
We review and evaluate the performance of internal and external clustering validation indices on the most common clustering algorithms. We suggest a classification framework for examining the functionality of both internal and external clustering validation measures.
arXiv Detail & Related papers (2024-07-18T13:52:02Z)
A3S: A General Active Clustering Method with Pairwise Constraints [66.74627463101837]
A3S features strategic active clustering adjustment on the initial cluster result, which is obtained by an adaptive clustering algorithm. In extensive experiments across diverse real-world datasets, A3S achieves desired results with significantly fewer human queries.
arXiv Detail & Related papers (2024-07-14T13:37:03Z)
From Large to Small Datasets: Size Generalization for Clustering Algorithm Selection [12.993073967843292]
We study a problem in a semi-supervised setting, with an unknown ground-truth clustering. We introduce a notion of size generalization for clustering algorithm accuracy. We use a subsample of as little as 5% of the data to identify which algorithm is best on the full dataset.
arXiv Detail & Related papers (2024-02-22T06:53:35Z)
Generalized Category Discovery with Clustering Assignment Consistency [56.92546133591019]
Generalized category discovery (GCD) is a recently proposed open-world task. We propose a co-training-based framework that encourages clustering consistency. Our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.
arXiv Detail & Related papers (2023-10-30T00:32:47Z)
Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering. In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework. In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z)
Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z)
SSDBCODI: Semi-Supervised Density-Based Clustering with Outliers Detection Integrated [1.8444322599555096]
Clustering analysis is one of the critical tasks in machine learning. Due to the fact that the performance of clustering clustering can be significantly eroded by outliers, algorithms try to incorporate the process of outlier detection. We have proposed SSDBCODI, a semi-supervised detection element.
arXiv Detail & Related papers (2022-08-10T21:06:38Z)
Fair Labeled Clustering [28.297893914525517]
We consider the downstream application of clustering and how group fairness should be ensured for such a setting. We provide algorithms for such problems and show that in contrast to their NP-hard counterparts in group fair clustering, they permit efficient solutions. We also consider a well-motivated alternative setting where the decision-maker is free to assign labels to the clusters regardless of the centers' positions in the metric space.
arXiv Detail & Related papers (2022-05-28T07:07:12Z)
Ensemble Method for Cluster Number Determination and Algorithm Selection in Unsupervised Learning [0.0]
Unsupervised learning suffers from the need for expertise in the field to be of use. We propose an ensemble clustering framework which can be leveraged with minimal input.
arXiv Detail & Related papers (2021-12-23T04:59:10Z)
Determinantal consensus clustering [77.34726150561087]
We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms. DPPs favor diversity of the center points within subsets. We show through simulations that, contrary to DPP, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets.
arXiv Detail & Related papers (2021-02-07T23:48:24Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback. We devise an algorithm with a minimal cluster recovery error rate. For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.