Optimization of Inter-group Criteria for Clustering with Minimum Size
Constraints
- URL: http://arxiv.org/abs/2401.07091v1
- Date: Sat, 13 Jan 2024 14:59:12 GMT
- Title: Optimization of Inter-group Criteria for Clustering with Minimum Size
Constraints
- Authors: Eduardo S. Laber and Lucas Murtinho
- Abstract summary: Internal measures that are used to assess the quality of a clustering usually take into account intra-group and/or inter-group criteria.
We propose algorithms with provable approximation guarantees for optimizing the former.
We present an empirical study with 10 real datasets, providing evidence that our methods work very well in practical settings.
- Score: 2.7195102129095003
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Internal measures that are used to assess the quality of a clustering usually
take into account intra-group and/or inter-group criteria. There are many
papers in the literature that propose algorithms with provable approximation
guarantees for optimizing the former. However, the optimization of inter-group
criteria is much less understood.
Here, we contribute to the state-of-the-art of this literature by devising
algorithms with provable guarantees for the maximization of two natural
inter-group criteria, namely the minimum spacing and the minimum spanning tree
spacing. The former is the minimum distance between points in different groups
while the latter captures separability through the cost of the minimum spanning
tree that connects all groups. We obtain results for both the unrestricted
case, in which no constraint on the clusters is imposed, and for the
constrained case where each group is required to have a minimum number of
points. Our constraint is motivated by the fact that the popular Single
Linkage, which optimizes both criteria in the unrestricted case, produces
clusterings with many tiny groups.
To complement our work, we present an empirical study with 10 real datasets,
providing evidence that our methods work very well in practical settings.
Related papers
- Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Group conditional validity via multi-group learning [5.797821810358083]
We consider the problem of distribution-free conformal prediction and the criterion of group conditional validity.
Existing methods achieve such guarantees under either restrictive grouping structure or distributional assumptions.
We propose a simple reduction to the problem of achieving validity guarantees for individual populations by leveraging algorithms for a problem called multi-group learning.
arXiv Detail & Related papers (2023-03-07T15:51:03Z) - Fair Minimum Representation Clustering [0.0]
Clustering is an unsupervised learning task that aims to partition data into a set of clusters.
We show that the popular $k$-means algorithm, Lloyd's algorithm, can result in unfair outcomes.
We present a variant of Lloyd's algorithm, called MiniReL, that directly incorporates the fairness constraints.
arXiv Detail & Related papers (2023-02-06T23:16:38Z) - An Exact Algorithm for Semi-supervised Minimum Sum-of-Squares Clustering [0.5801044612920815]
We present a new branch-and-bound algorithm for semi-supervised MSSC.
Background knowledge is incorporated as pairwise must-link and cannot-link constraints.
For the first time, the proposed global optimization algorithm efficiently manages to solve real-world instances up to 800 data points.
arXiv Detail & Related papers (2021-11-30T17:08:53Z) - Max-Min Grouped Bandits [48.62520520818357]
We introduce a multi-armed bandit problem termed max-min grouped bandits.
The goal is to find a group whose worst arm has the highest mean reward.
This problem is of interest to applications such as recommendation systems.
arXiv Detail & Related papers (2021-11-17T01:59:15Z) - Feature-based Individual Fairness in k-Clustering [14.847868952138795]
We consider the problem of clustering a set of points while ensuring fairness constraints.
We introduce a new notion of individual fairness in k-clustering based on features that are not necessarily used for clustering.
arXiv Detail & Related papers (2021-09-09T20:42:02Z) - Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering.
We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure.
We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z) - Fairness, Semi-Supervised Learning, and More: A General Framework for
Clustering with Stochastic Pairwise Constraints [32.19047459493177]
We introduce a novel family of emphstochastic pairwise constraints, which we incorporate into several essential clustering objectives.
We show that these constraints can succinctly model an intriguing collection of applications, including emphIndividual Fairness in clustering and emphMust-link constraints in semi-supervised learning.
arXiv Detail & Related papers (2021-03-02T20:27:58Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z) - Robust Optimization for Fairness with Noisy Protected Groups [85.13255550021495]
We study the consequences of naively relying on noisy protected group labels.
We introduce two new approaches using robust optimization.
We show that the robust approaches achieve better true group fairness guarantees than the naive approach.
arXiv Detail & Related papers (2020-02-21T14:58:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.