KFC: A Scalable Approximation Algorithm for $k$-center Fair Clustering
- URL: http://arxiv.org/abs/2010.13949v2
- Date: Sat, 7 Nov 2020 08:33:30 GMT
- Title: KFC: A Scalable Approximation Algorithm for $k$-center Fair Clustering
- Authors: Elfarouk Harb and Ho Shan Lam
- Abstract summary: We study the problem of fair clustering on the $k-$center objective.
In fair clustering, the input is $N$ points, each belonging to at least one of $l$ protected groups.
Our algorithm is effective in finding good clusters without over-representation or under-representation.
- Score: 6.09170287691728
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study the problem of fair clustering on the $k-$center
objective. In fair clustering, the input is $N$ points, each belonging to at
least one of $l$ protected groups, e.g. male, female, Asian, Hispanic. The
objective is to cluster the $N$ points into $k$ clusters to minimize a
classical clustering objective function. However, there is an additional
constraint that each cluster needs to be fair, under some notion of fairness.
This ensures that no group is either "over-represented" or "under-represented"
in any cluster. Our work builds on the work of Chierichetti et al. (NIPS 2017),
Bera et al. (NeurIPS 2019), Ahmadian et al. (KDD 2019), and Bercea et al.
(APPROX 2019). We obtain a randomized $3-$approximation algorithm for the
$k-$center objective function, beating the previous state of the art
($4-$approximation). We test our algorithm on real datasets, and show that our
algorithm is effective in finding good clusters without over-representation or
under-representation, surpassing the current state of the art in runtime speed,
clustering cost, while achieving similar fairness violations.
Related papers
- A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al.
A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$.
We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z) - A Unified Framework for Gradient-based Clustering of Distributed Data [51.904327888475606]
We develop a family of distributed clustering algorithms that work over networks of users.
DGC-$mathcalF_rho$ is specialized to popular clustering losses like $K$-means and Huber loss.
We show that consensus fixed points of DGC-$mathcalF_rho$ are equivalent to fixed points of gradient clustering over the full data.
arXiv Detail & Related papers (2024-02-02T10:44:42Z) - Simple, Scalable and Effective Clustering via One-Dimensional
Projections [10.807367640692021]
Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis.
We introduce a simple randomized clustering algorithm that provably runs in expected time $O(mathrmnnz(X) + nlog n)$ for arbitrary $k$.
We prove that our algorithm achieves approximation ratio $smashwidetildeO(k4)$ on any input dataset for the $k$-means objective.
arXiv Detail & Related papers (2023-10-25T16:37:45Z) - Approximating Fair $k$-Min-Sum-Radii in $\mathbb{R}^d$ [1.7561849652813544]
We study the $k$-min-sum-radii problem in Euclidean spaces of arbitrary dimension for the case of constant $k$.
We propose a PTAS for the fair $k$-min-sum-radii problem in Euclidean spaces of arbitrary dimension for the case of constant $k$.
arXiv Detail & Related papers (2023-09-02T06:01:59Z) - Do you know what q-means? [50.045011844765185]
Clustering is one of the most important tools for analysis of large datasets.
We present an improved version of the "$q$-means" algorithm for clustering.
We also present a "dequantized" algorithm for $varepsilon which runs in $Obig(frack2varepsilon2(sqrtkd + log(Nd))big.
arXiv Detail & Related papers (2023-08-18T17:52:12Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Explainable k-means. Don't be greedy, plant bigger trees! [12.68470213641421]
We provide a new bi-criteria $tildeO(log2 k)$ competitive algorithm for explainable $k$-means clustering.
Explainable $k$-means was recently introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML 2020)
arXiv Detail & Related papers (2021-11-04T23:15:17Z) - Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem.
We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z) - K-expectiles clustering [0.0]
We propose a novel partitioning clustering algorithm based on expectiles.
We suggest two schemes: fixed $tau$ clustering, and adaptive $tau$ clustering.
arXiv Detail & Related papers (2021-03-16T21:14:56Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z) - Explainable $k$-Means and $k$-Medians Clustering [25.513261099927163]
We consider using a small decision tree to partition a data set into clusters, so that clusters can be characterized in a straightforward manner.
We show that popular top-down decision tree algorithms may lead to clusterings with arbitrarily large cost.
We design an efficient algorithm that produces explainable clusters using a tree with $k$ leaves.
arXiv Detail & Related papers (2020-02-28T04:21:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.