Fairness, Semi-Supervised Learning, and More: A General Framework for
Clustering with Stochastic Pairwise Constraints
- URL: http://arxiv.org/abs/2103.02013v1
- Date: Tue, 2 Mar 2021 20:27:58 GMT
- Title: Fairness, Semi-Supervised Learning, and More: A General Framework for
Clustering with Stochastic Pairwise Constraints
- Authors: Brian Brubach, Darshan Chakrabarti, John P. Dickerson, Aravind
Srinivasan, Leonidas Tsepenekas
- Abstract summary: We introduce a novel family of emphstochastic pairwise constraints, which we incorporate into several essential clustering objectives.
We show that these constraints can succinctly model an intriguing collection of applications, including emphIndividual Fairness in clustering and emphMust-link constraints in semi-supervised learning.
- Score: 32.19047459493177
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Metric clustering is fundamental in areas ranging from Combinatorial
Optimization and Data Mining, to Machine Learning and Operations Research.
However, in a variety of situations we may have additional requirements or
knowledge, distinct from the underlying metric, regarding which pairs of points
should be clustered together. To capture and analyze such scenarios, we
introduce a novel family of \emph{stochastic pairwise constraints}, which we
incorporate into several essential clustering objectives (radius/median/means).
Moreover, we demonstrate that these constraints can succinctly model an
intriguing collection of applications, including among others \emph{Individual
Fairness} in clustering and \emph{Must-link} constraints in semi-supervised
learning. Our main result consists of a general framework that yields
approximation algorithms with provable guarantees for important clustering
objectives, while at the same time producing solutions that respect the
stochastic pairwise constraints. Furthermore, for certain objectives we devise
improved results in the case of Must-link constraints, which are also the best
possible from a theoretical perspective. Finally, we present experimental
evidence that validates the effectiveness of our algorithms.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - GCC: Generative Calibration Clustering [55.44944397168619]
We propose a novel Generative Clustering (GCC) method to incorporate feature learning and augmentation into clustering procedure.
First, we develop a discrimirative feature alignment mechanism to discover intrinsic relationship across real and generated samples.
Second, we design a self-supervised metric learning to generate more reliable cluster assignment.
arXiv Detail & Related papers (2024-04-14T01:51:11Z) - Memetic Differential Evolution Methods for Semi-Supervised Clustering [0.8681835475119588]
We propose an extension for semi-supervised Minimum Sum-of-Squares Clustering (MSSC) problems of MDEClust.
Our new framework, called S-MDEClust, represents the first memetic methodology designed to generate an optimal feasible solution.
arXiv Detail & Related papers (2024-03-07T08:37:36Z) - Neural Capacitated Clustering [6.155158115218501]
We propose a new method for the Capacitated Clustering Problem (CCP) that learns a neural network to predict the assignment probabilities of points to cluster centers.
In our experiments on artificial data and two real world datasets our approach outperforms several state-of-the-art mathematical and solvers from the literature.
arXiv Detail & Related papers (2023-02-10T09:33:44Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - Cluster-and-Conquer: A Framework For Time-Series Forecasting [94.63501563413725]
We propose a three-stage framework for forecasting high-dimensional time-series data.
Our framework is highly general, allowing for any time-series forecasting and clustering method to be used in each step.
When instantiated with simple linear autoregressive models, we are able to achieve state-of-the-art results on several benchmark datasets.
arXiv Detail & Related papers (2021-10-26T20:41:19Z) - Clustering to the Fewest Clusters Under Intra-Cluster Dissimilarity
Constraints [0.0]
equiwide clustering relies neither on density nor on a predefined number of expected classes, but on a dissimilarity threshold.
We review and evaluate suitable clustering algorithms to identify trade-offs between the various practical solutions for this clustering problem.
arXiv Detail & Related papers (2021-09-28T12:02:18Z) - Transductive Few-Shot Learning: Clustering is All You Need? [31.21306826132773]
We investigate a general formulation for transive few-shot learning, which integrates prototype-based objectives.
We find that our method yields competitive performances, in term of accuracy and optimization, while scaling up to large problems.
Surprisingly, we find that our general model already achieve competitive performances in comparison to the state-of-the-art learning.
arXiv Detail & Related papers (2021-06-16T16:14:01Z) - HAWKS: Evolving Challenging Benchmark Sets for Cluster Analysis [2.5329716878122404]
Comprehensive benchmarking of clustering algorithms is difficult.
There is no consensus regarding the best practice for rigorous benchmarking.
We demonstrate the important role evolutionary algorithms play to support flexible generation of such benchmarks.
arXiv Detail & Related papers (2021-02-13T15:01:34Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Combining Task Predictors via Enhancing Joint Predictability [53.46348489300652]
We present a new predictor combination algorithm that improves the target by i) measuring the relevance of references based on their capabilities in predicting the target, and ii) strengthening such estimated relevance.
Our algorithm jointly assesses the relevance of all references by adopting a Bayesian framework.
Based on experiments on seven real-world datasets from visual attribute ranking and multi-class classification scenarios, we demonstrate that our algorithm offers a significant performance gain and broadens the application range of existing predictor combination approaches.
arXiv Detail & Related papers (2020-07-15T21:58:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.