A Stochastic Alternating Balance $k$-Means Algorithm for Fair Clustering
- URL: http://arxiv.org/abs/2105.14172v1
- Date: Sat, 29 May 2021 01:47:15 GMT
- Title: A Stochastic Alternating Balance $k$-Means Algorithm for Fair Clustering
- Authors: Suyun Liu, Luis Nunes Vicente
- Abstract summary: In the application of data clustering to human-centric decision-making systems, such as loan applications and advertisement, the clustering outcome might discriminate against people across different demographic groups.
We propose a novel alternating balance mini-batch $k$-means (SAKM) algorithm, which consists of $k$-means updates and group swap updates.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the application of data clustering to human-centric decision-making
systems, such as loan applications and advertisement recommendations, the
clustering outcome might discriminate against people across different
demographic groups, leading to unfairness. A natural conflict occurs between
the cost of clustering (in terms of distance to cluster centers) and the
balance representation of all demographic groups across the clusters, leading
to a bi-objective optimization problem that is nonconvex and nonsmooth. To
determine the complete trade-off between these two competing goals, we design a
novel stochastic alternating balance fair $k$-means (SAfairKM) algorithm, which
consists of alternating classical mini-batch $k$-means updates and group swap
updates. The number of $k$-means updates and the number of swap updates
essentially parameterize the weight put on optimizing each objective function.
Our numerical experiments show that the proposed SAfairKM algorithm is robust
and computationally efficient in constructing well-spread and high-quality
Pareto fronts both on synthetic and real datasets. Moreover, we propose a novel
companion algorithm, the stochastic alternating bi-objective gradient descent
(SA2GD) algorithm, which can handle a smooth version of the considered
bi-objective fair $k$-means problem, more amenable for analysis. A sublinear
convergence rate of $\mathcal{O}(1/T)$ is established under strong convexity
for the determination of a stationary point of a weighted sum of the two
functions parameterized by the number of steps or updates on each function.
Related papers
- Fair Clustering for Data Summarization: Improved Approximation Algorithms and Complexity Insights [16.120911591795295]
In some applications all data points can be chosen as centers, in the general setting, centers must be chosen from a subset of points, referred as facilities or suppliers.
In this work, we focus on fair data summarization modeled as the fair $k$-supplier problem, where data consists of several groups, and a minimum number of centers must be selected from each group.
arXiv Detail & Related papers (2024-10-16T18:00:19Z) - SP$^2$OT: Semantic-Regularized Progressive Partial Optimal Transport for Imbalanced Clustering [14.880015659013681]
We introduce a novel optimal transport-based pseudo-label learning framework.
Our framework formulates pseudo-label generation as a Semantic-regularized Progressive Partial Optimal Transport problem.
We employ the strategy of majorization to reformulate the SP$2$OT problem into a Progressive Partial Optimal Transport problem.
arXiv Detail & Related papers (2024-04-04T13:46:52Z) - Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games [66.2085181793014]
We show that a model-free stage-based Q-learning algorithm can enjoy the same optimality in the $H$ dependence as model-based algorithms.
Our algorithm features a key novel design of updating the reference value functions as the pair of optimistic and pessimistic value functions.
arXiv Detail & Related papers (2023-08-17T08:34:58Z) - Local Stochastic Bilevel Optimization with Momentum-Based Variance
Reduction [104.41634756395545]
We study Federated Bilevel Optimization problems. Specifically, we first propose the FedBiO, a deterministic gradient-based algorithm.
We show FedBiO has complexity of $O(epsilon-1.5)$.
Our algorithms show superior performances compared to other baselines in numerical experiments.
arXiv Detail & Related papers (2022-05-03T16:40:22Z) - A framework for bilevel optimization that enables stochastic and global
variance reduction algorithms [17.12280360174073]
Bilevel optimization is a problem of minimizing a value function which involves the arg-minimum of another function.
We introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time.
We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(frac1T)$ convergence rate, and that it achieves linear convergence under Polyak-Lojasciewicz assumption.
arXiv Detail & Related papers (2022-01-31T18:17:25Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - On Stochastic Moving-Average Estimators for Non-Convex Optimization [105.22760323075008]
In this paper, we demonstrate the power of a widely used estimator based on moving average (SEMA) problems.
For all these-the-art results, we also present the results for all these-the-art problems.
arXiv Detail & Related papers (2021-04-30T08:50:24Z) - A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm
for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems.
MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z) - A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis
and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure.
We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem.
We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z) - Efficient Clustering for Stretched Mixtures: Landscape and Optimality [4.2111286819721485]
This paper considers a canonical clustering problem where one receives unlabeled samples drawn from a balanced mixture of two elliptical distributions.
We show that the non-optimal clustering function exhibits desirable geometric properties when the sample size exceeds some constant statistical objectives.
arXiv Detail & Related papers (2020-03-22T17:57:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.