Better Algorithms for Individually Fair $k$-Clustering
- URL: http://arxiv.org/abs/2106.12150v1
- Date: Wed, 23 Jun 2021 04:16:46 GMT
- Title: Better Algorithms for Individually Fair $k$-Clustering
- Authors: Deeparnab Chakrabarty and Maryam Negahbani
- Abstract summary: Jung, Kannan, and Lutz [FORC 2020] introduced a clustering algorithm with provable (approximate) fairness and objective guarantees for the $ell_infty$ or $k$-Center objective. Mahabadi and Vakilian [ICML 2020] revisited this problem to give a local-search algorithm for all $ell_p$-norms.
We prove that by modifying known LP rounding techniques, one gets a worst-case guarantee on the objective which is much better than in MV20, and empirically, this objective is extremely close to the optimal.
- Score: 4.936817539835045
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study data clustering problems with $\ell_p$-norm objectives (e.g.
$k$-Median and $k$-Means) in the context of individual fairness. The dataset
consists of $n$ points, and we want to find $k$ centers such that (a) the
objective is minimized, while (b) respecting the individual fairness constraint
that every point $v$ has a center within a distance at most $r(v)$, where
$r(v)$ is $v$'s distance to its $(n/k)$th nearest point. Jung, Kannan, and Lutz
[FORC 2020] introduced this concept and designed a clustering algorithm with
provable (approximate) fairness and objective guarantees for the $\ell_\infty$
or $k$-Center objective. Mahabadi and Vakilian [ICML 2020] revisited this
problem to give a local-search algorithm for all $\ell_p$-norms. Empirically,
their algorithms outperform Jung et. al.'s by a large margin in terms of cost
(for $k$-Median and $k$-Means), but they incur a reasonable loss in fairness.
In this paper, our main contribution is to use Linear Programming (LP)
techniques to obtain better algorithms for this problem, both in theory and in
practice. We prove that by modifying known LP rounding techniques, one gets a
worst-case guarantee on the objective which is much better than in MV20, and
empirically, this objective is extremely close to the optimal. Furthermore, our
theoretical fairness guarantees are comparable with MV20 in theory, and
empirically, we obtain noticeably fairer solutions. Although solving the LP
{\em exactly} might be prohibitive, we demonstrate that in practice, a simple
sparsification technique drastically improves the run-time of our algorithm.
Related papers
- Relax and Merge: A Simple Yet Effective Framework for Solving Fair $k$-Means and $k$-sparse Wasserstein Barycenter Problems [8.74967598360817]
Given a dataset comprising several groups, the fairness constraint requires that each cluster should contain a proportion of points from each group.
We propose a novel Relax and Merge'' framework, where $rho$ is the approximate ratio of an off-the-shelf vanilla $k$-means algorithm.
If equipped with a PTAS of $k$-means, our solution can achieve an approximation ratio of $(5+O(epsilon))$ with only a slight violation of the fairness constraints.
arXiv Detail & Related papers (2024-11-02T02:50:12Z) - A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al.
A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$.
We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z) - Mind the gap: Achieving a super-Grover quantum speedup by jumping to the
end [114.3957763744719]
We present a quantum algorithm that has rigorous runtime guarantees for several families of binary optimization problems.
We show that the algorithm finds the optimal solution in time $O*(2(0.5-c)n)$ for an $n$-independent constant $c$.
We also show that for a large fraction of random instances from the $k$-spin model and for any fully satisfiable or slightly frustrated $k$-CSP formula, statement (a) is the case.
arXiv Detail & Related papers (2022-12-03T02:45:23Z) - Scalable Differentially Private Clustering via Hierarchically Separated
Trees [82.69664595378869]
We show that our method computes a solution with cost at most $O(d3/2log n)cdot OPT + O(k d2 log2 n / epsilon2)$, where $epsilon$ is the privacy guarantee.
Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical.
arXiv Detail & Related papers (2022-06-17T09:24:41Z) - Improved Approximation Algorithms for Individually Fair Clustering [9.914246432182873]
We present an improved $(16p +varepsilon,3)$-bicriteria approximation for the fair $k$-clustering with $ell_p$-norm cost.
Our approach suggests a reduction from our individually fair clustering to a clustering with a group fairness requirement proposed by Kleindessner et al.
arXiv Detail & Related papers (2021-06-26T15:22:52Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - A New Notion of Individually Fair Clustering: $\alpha$-Equitable
$k$-Center [31.936733991407134]
We introduce a novel definition of fairness for clustering problems.
In our model each point $j$ has a set of other points $mathcalS_j$ that it perceives as similar to itself.
We provide efficient and easily implementable approximation algorithms for the $k$-center objective.
arXiv Detail & Related papers (2021-06-09T22:52:00Z) - Maximizing Determinants under Matroid Constraints [69.25768526213689]
We study the problem of finding a basis $S$ of $M$ such that $det(sum_i in Sv_i v_i v_itop)$ is maximized.
This problem appears in a diverse set of areas such as experimental design, fair allocation of goods, network design, and machine learning.
arXiv Detail & Related papers (2020-04-16T19:16:38Z) - Locally Private Hypothesis Selection [96.06118559817057]
We output a distribution from $mathcalQ$ whose total variation distance to $p$ is comparable to the best such distribution.
We show that the constraint of local differential privacy incurs an exponential increase in cost.
Our algorithms result in exponential improvements on the round complexity of previous methods.
arXiv Detail & Related papers (2020-02-21T18:30:48Z) - Individual Fairness for $k$-Clustering [16.988236398003068]
Local search based algorithm for $k$-median and $k$-means.
We show how to get a bicriteria approximation for fair $k$-clustering.
arXiv Detail & Related papers (2020-02-17T02:31:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.