Constrained Clustering and Multiple Kernel Learning without Pairwise
Constraint Relaxation
- URL: http://arxiv.org/abs/2203.12546v1
- Date: Wed, 23 Mar 2022 17:07:53 GMT
- Title: Constrained Clustering and Multiple Kernel Learning without Pairwise
Constraint Relaxation
- Authors: Benedikt Boecking, Vincent Jeanselme, Artur Dubrawski
- Abstract summary: We introduce a new constrained clustering algorithm that jointly clusters data and learns a kernel in accordance with the available pairwise constraints.
We show that the proposed method outperforms existing approaches on a large number of diverse publicly available datasets.
- Score: 15.232192645789485
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clustering under pairwise constraints is an important knowledge discovery
tool that enables the learning of appropriate kernels or distance metrics to
improve clustering performance. These pairwise constraints, which come in the
form of must-link and cannot-link pairs, arise naturally in many applications
and are intuitive for users to provide. However, the common practice of
relaxing discrete constraints to a continuous domain to ease optimization when
learning kernels or metrics can harm generalization, as information which only
encodes linkage is transformed to informing distances. We introduce a new
constrained clustering algorithm that jointly clusters data and learns a kernel
in accordance with the available pairwise constraints. To generalize well, our
method is designed to maximize constraint satisfaction without relaxing
pairwise constraints to a continuous domain where they inform distances. We
show that the proposed method outperforms existing approaches on a large number
of diverse publicly available datasets, and we discuss how our method can scale
to handling large data.
Related papers
- Kernel Correlation-Dissimilarity for Multiple Kernel k-Means Clustering [21.685153346752124]
Current methods enhance information diversity and reduce redundancy by exploiting interdependencies among multiple kernels based on correlations or dissimilarities.
We introduce a novel method that systematically integrates both kernel correlation and dissimilarity.
By emphasizing the coherence between kernel correlation and dissimilarity, our method offers a more objective and transparent strategy for extracting non-linear information.
arXiv Detail & Related papers (2024-03-06T04:24:43Z) - Large Scale Constrained Clustering With Reinforcement Learning [1.3597551064547502]
Given a network, allocating resources at clusters level, rather than at each node, enhances efficiency in resource allocation and usage.
We propose an approach to solve this constrained clustering problem via reinforcement learning.
In the results section, we show that our algorithm finds near optimal solutions, even for large scale instances.
arXiv Detail & Related papers (2024-02-15T18:27:18Z) - Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data.
The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships.
A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI)
Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation.
We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z) - Meta Clustering Learning for Large-scale Unsupervised Person
Re-identification [124.54749810371986]
We propose a "small data for big task" paradigm dubbed Meta Clustering Learning (MCL)
MCL only pseudo-labels a subset of the entire unlabeled data via clustering to save computing for the first-phase training.
Our method significantly saves computational cost while achieving a comparable or even better performance compared to prior works.
arXiv Detail & Related papers (2021-11-19T04:10:18Z) - Optimization on manifolds: A symplectic approach [127.54402681305629]
We propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems.
Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts.
arXiv Detail & Related papers (2021-07-23T13:43:34Z) - Exploring dual information in distance metric learning for clustering [1.452875650827562]
We propose to exploit the dual information associated with the pairwise constraints of the semi-supervised clustering problem.
Experiments clearly show that distance metric learning algorithms benefit from integrating this dual information.
arXiv Detail & Related papers (2021-05-26T17:33:23Z) - Fairness, Semi-Supervised Learning, and More: A General Framework for
Clustering with Stochastic Pairwise Constraints [32.19047459493177]
We introduce a novel family of emphstochastic pairwise constraints, which we incorporate into several essential clustering objectives.
We show that these constraints can succinctly model an intriguing collection of applications, including emphIndividual Fairness in clustering and emphMust-link constraints in semi-supervised learning.
arXiv Detail & Related papers (2021-03-02T20:27:58Z) - A Framework for Deep Constrained Clustering [19.07636653413663]
Constrained clustering formulations exist for popular algorithms such as k-means, mixture models, and spectral clustering but have several limitations.
Here we explore a deep learning framework for constrained clustering and in particular explore how it can extend the field of constrained clustering.
We show that our framework can not only handle standard together/apart constraints (without the well documented negative effects reported earlier) generated from labeled side information.
We propose an efficient training paradigm that is generally applicable to these four types of constraints.
arXiv Detail & Related papers (2021-01-07T22:49:06Z) - Learning, compression, and leakage: Minimising classification error via
meta-universal compression principles [87.054014983402]
A promising group of compression techniques for learning scenarios is normalised maximum likelihood (NML) coding.
Here we consider a NML-based decision strategy for supervised classification problems, and show that it attains PAC learning when applied to a wide variety of models.
We show that the misclassification rate of our method is upper bounded by the maximal leakage, a recently proposed metric to quantify the potential of data leakage in privacy-sensitive scenarios.
arXiv Detail & Related papers (2020-10-14T20:03:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.