Fair Representation Clustering with Several Protected Classes
        - URL: http://arxiv.org/abs/2202.01391v1
- Date: Thu, 3 Feb 2022 03:45:45 GMT
- Title: Fair Representation Clustering with Several Protected Classes
- Authors: Zhen Dai, Yury Makarychev, Ali Vakilian
- Abstract summary: We study the problem of fair $k$-median where each cluster is required to have a fair representation of individuals from different groups.
We present an $O(log k)$-approximation algorithm that runs in time $nO(ell)$.
- Score: 13.53362222844008
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   We study the problem of fair $k$-median where each cluster is required to
have a fair representation of individuals from different groups. In the fair
representation $k$-median problem, we are given a set of points $X$ in a metric
space. Each point $x\in X$ belongs to one of $\ell$ groups. Further, we are
given fair representation parameters $\alpha_j$ and $\beta_j$ for each group
$j\in [\ell]$. We say that a $k$-clustering $C_1, \cdots, C_k$ fairly
represents all groups if the number of points from group $j$ in cluster $C_i$
is between $\alpha_j |C_i|$ and $\beta_j |C_i|$ for every $j\in[\ell]$ and
$i\in [k]$. The goal is to find a set $\mathcal{C}$ of $k$ centers and an
assignment $\phi: X\rightarrow \mathcal{C}$ such that the clustering defined by
$(\mathcal{C}, \phi)$ fairly represents all groups and minimizes the
$\ell_1$-objective $\sum_{x\in X} d(x, \phi(x))$.
  We present an $O(\log k)$-approximation algorithm that runs in time
$n^{O(\ell)}$. Note that the known algorithms for the problem either (i)
violate the fairness constraints by an additive term or (ii) run in time that
is exponential in both $k$ and $\ell$. We also consider an important special
case of the problem where $\alpha_j = \beta_j = \frac{f_j}{f}$ and $f_j, f \in
\mathbb{N}$ for all $j\in [\ell]$. For this special case, we present an $O(\log
k)$-approximation algorithm that runs in $(kf)^{O(\ell)}\log n + poly(n)$ time.
 
      
        Related papers
        - The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
 We show that this problem has randomized communication complexity $Omega(frac1kcdot n2log|mathbbF|)$.
As an application, we obtain an $Omega(frac1kcdot n2log|mathbbF|)$ space lower bound for any streaming algorithm with $k$ passes.
 arXiv  Detail & Related papers  (2024-10-26T06:21:42Z)
- LevAttention: Time, Space, and Streaming Efficient Algorithm for Heavy   Attentions [54.54897832889028]
 We show that for any $K$, there is a universal set" $U subset [n]$ of size independent of $n$, such that for any $Q$ and any row $i$, the large attention scores $A_i,j$ in row $i$ of $A$ all have $jin U$.
We empirically show the benefits of our scheme for vision transformers, showing how to train new models that use our universal set while training as well.
 arXiv  Detail & Related papers  (2024-10-07T19:47:13Z)
- A Polynomial-Time Approximation for Pairwise Fair $k$-Median Clustering [10.697784653113095]
 We study pairwise fair clustering with $ell ge 2$ groups, where for every cluster $C$ and every group $i in [ell]$, the number of points in $C$ from group $i$ must be at most $t times the number of points in $C$ from any other group $j in [ell]$.
We show that our problem even when $ell=2$ is almost as hard as the popular uniform capacitated $k$-median, for which no-time algorithm with an approximation factor of $o
 arXiv  Detail & Related papers  (2024-05-16T18:17:44Z)
- $\ell_p$-Regression in the Arbitrary Partition Model of Communication [59.89387020011663]
 We consider the randomized communication complexity of the distributed $ell_p$-regression problem in the coordinator model.
For $p = 2$, i.e., least squares regression, we give the first optimal bound of $tildeTheta(sd2 + sd/epsilon)$ bits.
For $p in (1,2)$,we obtain an $tildeO(sd2/epsilon + sd/mathrmpoly(epsilon)$ upper bound.
 arXiv  Detail & Related papers  (2023-07-11T08:51:53Z)
- Approximation Algorithms for Fair Range Clustering [14.380145034918158]
 This paper studies the fair range clustering problem in which the data points are from different demographic groups.
The goal is to pick $k$ centers with the minimum clustering cost such that each group is at least minimally represented in the centers set.
In particular, the fair range $ell_p$-clustering captures fair range $k$-center, $k$-median and $k$-means as its special cases.
 arXiv  Detail & Related papers  (2023-06-11T21:18:40Z)
- Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix
  Factorization [54.29685789885059]
 We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem.
The goal is to approximate $mathbfA$ as a product of low-rank factors.
Our techniques generalize to other common variants of the BMF problem.
 arXiv  Detail & Related papers  (2023-06-02T18:55:27Z)
- Low-Rank Approximation with $1/\epsilon^{1/3}$ Matrix-Vector Products [58.05771390012827]
 We study iterative methods based on Krylov subspaces for low-rank approximation under any Schatten-$p$ norm.
Our main result is an algorithm that uses only $tildeO(k/sqrtepsilon)$ matrix-vector products.
 arXiv  Detail & Related papers  (2022-02-10T16:10:41Z)
- Approximating Fair Clustering with Cascaded Norm Objectives [10.69111036810888]
 We find a clustering which minimizes the $ell_q$-norm of the vector over $W$ of the $ell_p$-norms of the weighted distances of points in $P$ from the centers.
This generalizes various clustering problems, including Socially Fair $k$-Median and $k$-Means.
 arXiv  Detail & Related papers  (2021-11-08T20:18:10Z)
- Local Correlation Clustering with Asymmetric Classification Errors [12.277755088736864]
 In the Correlation Clustering problem, we are given a complete weighted graph $G$ with its edges labeled as "similar" and "dissimilar"
For a clustering $mathcalC$ of graph $G$, a similar edge is in disagreement with $mathcalC$, if its endpoints belong to distinct clusters; and a dissimilar edge is in disagreement with $mathcalC$ if its endpoints belong to the same cluster.
We produce a clustering that minimizes the $ell_p$ norm of the disagreements vector for $pgeq 1$
 arXiv  Detail & Related papers  (2021-08-11T12:31:48Z)
- FPT Approximation for Socially Fair Clustering [0.38073142980733]
 We are given a set of points $P$ in a metric space $mathcalX$ with a distance function $d(.,.)$.
The goal of the socially fair $k$-median problem is to find a set $C subseteq F$ of $k$ centers that minimizes the maximum average cost over all the groups.
In this work, we design $(5+varepsilon)$ and $(33 + varepsilon)$ approximation algorithms for the socially fair $k$-median and $k$-means
 arXiv  Detail & Related papers  (2021-06-12T11:53:18Z)
- Learning a Latent Simplex in Input-Sparsity Time [58.30321592603066]
 We consider the problem of learning a latent $k$-vertex simplex $KsubsetmathbbRdtimes n$, given access to $AinmathbbRdtimes n$.
We show that the dependence on $k$ in the running time is unnecessary given a natural assumption about the mass of the top $k$ singular values of $A$.
 arXiv  Detail & Related papers  (2021-05-17T16:40:48Z)
- Sets Clustering [25.358415142404752]
 We prove that a core-set of $O(logn)$ sets always exists, and can be computed in $O(nlogn)$ time.
Applying an inefficient but optimal algorithm on this coreset allows us to obtain the first PTAS ($1+varepsilon$ approximation) for the sets-$k$-means problem.
Open source code and experimental results for document classification and facility locations are also provided.
 arXiv  Detail & Related papers  (2020-03-09T13:30:30Z)
- Tight Quantum Lower Bound for Approximate Counting with Quantum States [49.6558487240078]
 We prove tight lower bounds for the following variant of the counting problem considered by Aaronson, Kothari, Kretschmer, and Thaler ( 2020)
The task is to distinguish whether an input set $xsubseteq [n]$ has size either $k$ or $k'=(1+varepsilon)k$.
 arXiv  Detail & Related papers  (2020-02-17T10:53:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.