Related papers: A Stochastic Alternating Balance $k$-Means Algorithm for Fair Clustering

A Stochastic Alternating Balance $k$-Means Algorithm for Fair Clustering

URL: http://arxiv.org/abs/2105.14172v1
Date: Sat, 29 May 2021 01:47:15 GMT
Title: A Stochastic Alternating Balance $k$-Means Algorithm for Fair Clustering
Authors: Suyun Liu, Luis Nunes Vicente
Abstract summary: In the application of data clustering to human-centric decision-making systems, such as loan applications and advertisement, the clustering outcome might discriminate against people across different demographic groups. We propose a novel alternating balance mini-batch $k$-means (SAKM) algorithm, which consists of $k$-means updates and group swap updates.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the application of data clustering to human-centric decision-making systems, such as loan applications and advertisement recommendations, the clustering outcome might discriminate against people across different demographic groups, leading to unfairness. A natural conflict occurs between the cost of clustering (in terms of distance to cluster centers) and the balance representation of all demographic groups across the clusters, leading to a bi-objective optimization problem that is nonconvex and nonsmooth. To determine the complete trade-off between these two competing goals, we design a novel stochastic alternating balance fair $k$-means (SAfairKM) algorithm, which consists of alternating classical mini-batch $k$-means updates and group swap updates. The number of $k$-means updates and the number of swap updates essentially parameterize the weight put on optimizing each objective function. Our numerical experiments show that the proposed SAfairKM algorithm is robust and computationally efficient in constructing well-spread and high-quality Pareto fronts both on synthetic and real datasets. Moreover, we propose a novel companion algorithm, the stochastic alternating bi-objective gradient descent (SA2GD) algorithm, which can handle a smooth version of the considered bi-objective fair $k$-means problem, more amenable for analysis. A sublinear convergence rate of $\mathcal{O}(1/T)$ is established under strong convexity for the determination of a stationary point of a weighted sum of the two functions parameterized by the number of steps or updates on each function.

Related papers

Dynamic Consistent $k$-Center Clustering with Optimal Recourse [0.6077284832583713]
We prove that the $k$-center clustering problem admits optimal recourse bounds by developing a deterministic constant-factor approximation with worst-case recourse of $1$ per update. Our incremental algorithm improves over the $8$-approximation algorithm by Charikar, Chekuri, Feder, and Motwani.
arXiv Detail & Related papers (2024-12-04T11:39:03Z)
Fair Clustering for Data Summarization: Improved Approximation Algorithms and Complexity Insights [16.120911591795295]
In some applications all data points can be chosen as centers, in the general setting, centers must be chosen from a subset of points, referred as facilities or suppliers. In this work, we focus on fair data summarization modeled as the fair $k$-supplier problem, where data consists of several groups, and a minimum number of centers must be selected from each group.
arXiv Detail & Related papers (2024-10-16T18:00:19Z)
SP$^2$OT: Semantic-Regularized Progressive Partial Optimal Transport for Imbalanced Clustering [14.880015659013681]
We introduce a novel optimal transport-based pseudo-label learning framework. Our framework formulates pseudo-label generation as a Semantic-regularized Progressive Partial Optimal Transport problem. We employ the strategy of majorization to reformulate the SP$2$OT problem into a Progressive Partial Optimal Transport problem.
arXiv Detail & Related papers (2024-04-04T13:46:52Z)
Diversity-aware clustering: Computational Complexity and Approximation Algorithms [18.009333081689498]
We study diversity-aware clustering problems where the data points are associated with multiple attributes resulting in intersecting groups.<n>A clustering solution needs to ensure that the number of chosen cluster centers from each group should be within the range defined by a lower and upper bound threshold for each group.<n>We present parameterized approximation algorithms with approximation ratios $1+ frac2e + epsilon approx 1.736$+frac8e + epsilon approx 3.943$, and $5$ for diversity-aware $k$-median, diversity-aware $
arXiv Detail & Related papers (2024-01-10T19:01:05Z)
Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games [66.2085181793014]
We show that a model-free stage-based Q-learning algorithm can enjoy the same optimality in the $H$ dependence as model-based algorithms. Our algorithm features a key novel design of updating the reference value functions as the pair of optimistic and pessimistic value functions.
arXiv Detail & Related papers (2023-08-17T08:34:58Z)
Revisiting Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [69.15976031704687]
We propose IAC (Instance-Adaptive Clustering), the first algorithm whose performance matches the instance-specific lower bounds both in expectation and with high probability. IAC maintains an overall computational complexity of $ mathcalO(n, textpolylog(n) $, making it scalable and practical for large-scale problems.
arXiv Detail & Related papers (2023-06-18T08:46:06Z)
Local Stochastic Bilevel Optimization with Momentum-Based Variance Reduction [104.41634756395545]
We study Federated Bilevel Optimization problems. Specifically, we first propose the FedBiO, a deterministic gradient-based algorithm. We show FedBiO has complexity of $O(epsilon-1.5)$. Our algorithms show superior performances compared to other baselines in numerical experiments.
arXiv Detail & Related papers (2022-05-03T16:40:22Z)
A framework for bilevel optimization that enables stochastic and global variance reduction algorithms [17.12280360174073]
Bilevel optimization is a problem of minimizing a value function which involves the arg-minimum of another function. We introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time. We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(frac1T)$ convergence rate, and that it achieves linear convergence under Polyak-Lojasciewicz assumption.
arXiv Detail & Related papers (2022-01-31T18:17:25Z)
Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks. We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution. We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z)
A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems. MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z)
Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization [93.78811018928583]
This paper provides a framework to analyze the convergence of federated heterogeneous optimization algorithms. We propose FedNova, a normalized averaging method that eliminates objective inconsistency while preserving fast error convergence.
arXiv Detail & Related papers (2020-07-15T05:01:23Z)
A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure. We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem. We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z)
Efficient Clustering for Stretched Mixtures: Landscape and Optimality [4.2111286819721485]
This paper considers a canonical clustering problem where one receives unlabeled samples drawn from a balanced mixture of two elliptical distributions. We show that the non-optimal clustering function exhibits desirable geometric properties when the sample size exceeds some constant statistical objectives.
arXiv Detail & Related papers (2020-03-22T17:57:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.