Semi-Supervised Constrained Clustering: An In-Depth Overview, Ranked
Taxonomy and Future Research Directions
- URL: http://arxiv.org/abs/2303.00522v1
- Date: Tue, 28 Feb 2023 17:46:31 GMT
- Title: Semi-Supervised Constrained Clustering: An In-Depth Overview, Ranked
Taxonomy and Future Research Directions
- Authors: Germ\'an Gonz\'alez-Almagro, Daniel Peralta, Eli De Poorter,
Jos\'e-Ram\'on Cano, Salvador Garc\'ia
- Abstract summary: The research area of constrained clustering has grown significantly over the years.
No unifying overview is available to easily understand the wide variety of available methods, constraints and benchmarks.
This study presents in-detail the background of constrained clustering and provides a novel ranked taxonomy of the types of constraints that can be used in constrained clustering.
- Score: 2.5957372084704238
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clustering is a well-known unsupervised machine learning approach capable of
automatically grouping discrete sets of instances with similar characteristics.
Constrained clustering is a semi-supervised extension to this process that can
be used when expert knowledge is available to indicate constraints that can be
exploited. Well-known examples of such constraints are must-link (indicating
that two instances belong to the same group) and cannot-link (two instances
definitely do not belong together). The research area of constrained clustering
has grown significantly over the years with a large variety of new algorithms
and more advanced types of constraints being proposed. However, no unifying
overview is available to easily understand the wide variety of available
methods, constraints and benchmarks. To remedy this, this study presents
in-detail the background of constrained clustering and provides a novel ranked
taxonomy of the types of constraints that can be used in constrained
clustering. In addition, it focuses on the instance-level pairwise constraints,
and gives an overview of its applications and its historical context. Finally,
it presents a statistical analysis covering 307 constrained clustering methods,
categorizes them according to their features, and provides a ranking score
indicating which methods have the most potential based on their popularity and
validation quality. Finally, based upon this analysis, potential pitfalls and
future research directions are provided.
Related papers
- Towards Explainable Clustering: A Constrained Declarative based Approach [0.294944680995069]
We aim at finding a clustering that has high quality in terms of classic clustering criteria and that is explainable.
A good global explanation of a clustering should give the characteristics of each cluster taking into account their abilities to describe its objects.
We propose a novel interpretable constrained method called ECS for declarative computation with Explainabilty-driven Selection.
arXiv Detail & Related papers (2024-03-26T21:00:06Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning.
Recent work has established lower bounds against the class of low-degree methods.
We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z) - Expert-driven Trace Clustering with Instance-level Constraints [3.075612718858591]
We present two constrained trace clustering techniques that are capable of leverage expert knowledge in the form of instance-level constraints.
In an extensive experimental evaluation using two real-life datasets, we show that our novel techniques are indeed capable of producing clustering solutions that are more justifiable without a substantial negative impact on their quality.
arXiv Detail & Related papers (2021-10-13T13:18:58Z) - Clustering to the Fewest Clusters Under Intra-Cluster Dissimilarity
Constraints [0.0]
equiwide clustering relies neither on density nor on a predefined number of expected classes, but on a dissimilarity threshold.
We review and evaluate suitable clustering algorithms to identify trade-offs between the various practical solutions for this clustering problem.
arXiv Detail & Related papers (2021-09-28T12:02:18Z) - Deep Clustering by Semantic Contrastive Learning [67.28140787010447]
We introduce a novel variant called Semantic Contrastive Learning (SCL)
It explores the characteristics of both conventional contrastive learning and deep clustering.
It can amplify the strengths of contrastive learning and deep clustering in a unified approach.
arXiv Detail & Related papers (2021-03-03T20:20:48Z) - A Framework for Deep Constrained Clustering [19.07636653413663]
Constrained clustering formulations exist for popular algorithms such as k-means, mixture models, and spectral clustering but have several limitations.
Here we explore a deep learning framework for constrained clustering and in particular explore how it can extend the field of constrained clustering.
We show that our framework can not only handle standard together/apart constraints (without the well documented negative effects reported earlier) generated from labeled side information.
We propose an efficient training paradigm that is generally applicable to these four types of constraints.
arXiv Detail & Related papers (2021-01-07T22:49:06Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Unsupervised Multi-view Clustering by Squeezing Hybrid Knowledge from
Cross View and Each View [68.88732535086338]
This paper proposes a new multi-view clustering method, low-rank subspace multi-view clustering based on adaptive graph regularization.
Experimental results for five widely used multi-view benchmarks show that our proposed algorithm surpasses other state-of-the-art methods by a clear margin.
arXiv Detail & Related papers (2020-08-23T08:25:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.