Semi-Supervised Clustering via Markov Chain Aggregation
- URL: http://arxiv.org/abs/2112.09397v1
- Date: Fri, 17 Dec 2021 09:07:43 GMT
- Title: Semi-Supervised Clustering via Markov Chain Aggregation
- Authors: Sophie Steger and Bernhard C. Geiger and Marek Smieja
- Abstract summary: We introduce Constrained Markov Clustering (CoMaC) for semi-supervised clustering.
Our results indicate that CoMaC is competitive with the state-of-the-art.
- Score: 9.475039534437332
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We connect the problem of semi-supervised clustering to constrained Markov
aggregation, i.e., the task of partitioning the state space of a Markov chain.
We achieve this connection by considering every data point in the dataset as an
element of the Markov chain's state space, by defining the transition
probabilities between states via similarities between corresponding data
points, and by incorporating semi-supervision information as hard constraints
in a Hartigan-style algorithm. The introduced Constrained Markov Clustering
(CoMaC) is an extension of a recent information-theoretic framework for
(unsupervised) Markov aggregation to the semi-supervised case. Instantiating
CoMaC for certain parameter settings further generalizes two previous
information-theoretic objectives for unsupervised clustering. Our results
indicate that CoMaC is competitive with the state-of-the-art. Furthermore, our
approach is less sensitive to hyperparameter settings than the unsupervised
counterpart, which is especially attractive in the semi-supervised setting
characterized by little labeled data.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for Vision Transformer [57.37893387775829]
We introduce a fast and balanced clustering method, named textbfSemantic textbfEquitable textbfClustering (SEC)
SEC clusters tokens based on their global semantic relevance in an efficient, straightforward manner.
We propose a versatile vision backbone, SecViT, which attains an impressive textbf84.2% image classification accuracy with only textbf27M parameters and textbf4.4G FLOPs.
arXiv Detail & Related papers (2024-05-22T04:49:00Z) - Memetic Differential Evolution Methods for Semi-Supervised Clustering [1.0256438517258686]
We deal with semi-supervised Minimum Sum-of-Squares Clustering (MSSC) problems where background knowledge is given in the form of instance-level constraints.
We propose a novel memetic strategy based on the Differential Evolution paradigm, directly extending a state-of-the-art framework recently proposed in the unsupervised clustering literature.
arXiv Detail & Related papers (2024-03-07T08:37:36Z) - One-Step Multi-View Clustering Based on Transition Probability [61.841829428397034]
We introduce the One-Step Multi-View Clustering Based on Transition Probability (OSMVC-TP)
Our method directly learns the transition probabilities from anchor points to categories, and calculates the transition probabilities from samples to categories.
To maintain consistency in labels across different views, we apply a Schatten p-norm constraint on the tensor composed of the soft labels.
arXiv Detail & Related papers (2024-03-03T09:43:23Z) - Hoeffding's Inequality for Markov Chains under Generalized
Concentrability Condition [15.228649445346473]
This paper studies Hoeffding's inequality for Markov chains under the generalized concentrability condition defined via integral probability metric (IPM)
The flexibility of our framework allows Hoeffding's inequality to be applied beyond the ergodic Markov chains in the traditional sense.
arXiv Detail & Related papers (2023-10-04T16:21:23Z) - Multi-View Clustering via Semi-non-negative Tensor Factorization [120.87318230985653]
We develop a novel multi-view clustering based on semi-non-negative tensor factorization (Semi-NTF)
Our model directly considers the between-view relationship and exploits the between-view complementary information.
In addition, we provide an optimization algorithm for the proposed method and prove mathematically that the algorithm always converges to the stationary KKT point.
arXiv Detail & Related papers (2023-03-29T14:54:19Z) - Offline Estimation of Controlled Markov Chains: Minimaxity and Sample
Complexity [8.732260277121547]
We develop sample complexity bounds for the estimator and establish conditions for minimaxity.
We show that achieving a particular statistical risk bound involves a subtle and interesting trade-off between the strength of the mixing properties and the number of samples.
arXiv Detail & Related papers (2022-11-14T03:39:59Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - Deep Conditional Gaussian Mixture Model for Constrained Clustering [7.070883800886882]
Constrained clustering can leverage prior information on a growing amount of only partially labeled data.
We propose a novel framework for constrained clustering that is intuitive, interpretable, and can be trained efficiently in the framework of gradient variational inference.
arXiv Detail & Related papers (2021-06-11T13:38:09Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.