Efficient k-means with Individual Fairness via Exponential Tilting
- URL: http://arxiv.org/abs/2406.16557v1
- Date: Mon, 24 Jun 2024 11:50:31 GMT
- Title: Efficient k-means with Individual Fairness via Exponential Tilting
- Authors: Shengkun Zhu, Jinshan Zeng, Yuan Sun, Sheng Wang, Xiaodong Li, Zhiyong Peng,
- Abstract summary: In location-based resource allocation scenarios, individually fair clustering is often employed to achieve the principle of treating all points equally.
This paper proposes a novel algorithm, tilted k-means (TKM), aiming to achieve individual fairness in clustering.
- Score: 13.72284501663574
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In location-based resource allocation scenarios, the distances between each individual and the facility are desired to be approximately equal, thereby ensuring fairness. Individually fair clustering is often employed to achieve the principle of treating all points equally, which can be applied in these scenarios. This paper proposes a novel algorithm, tilted k-means (TKM), aiming to achieve individual fairness in clustering. We integrate the exponential tilting into the sum of squared errors (SSE) to formulate a novel objective function called tilted SSE. We demonstrate that the tilted SSE can generalize to SSE and employ the coordinate descent and first-order gradient method for optimization. We propose a novel fairness metric, the variance of the distances within each cluster, which can alleviate the Matthew Effect typically caused by existing fairness metrics. Our theoretical analysis demonstrates that the well-known k-means++ incurs a multiplicative error of O(k log k), and we establish the convergence of TKM under mild conditions. In terms of fairness, we prove that the variance generated by TKM decreases with a scaled hyperparameter. In terms of efficiency, we demonstrate the time complexity is linear with the dataset size. Our experiments demonstrate that TKM outperforms state-of-the-art methods in effectiveness, fairness, and efficiency.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Dataless Quadratic Neural Networks for the Maximum Independent Set Problem [23.643727259409744]
We show that pCQO-MIS scales with only the number of nodes in the graph not the number of edges.
Our method avoids out-of-distribution, sampling, and data-centric approaches.
arXiv Detail & Related papers (2024-06-27T21:12:48Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Imbalanced Data Clustering using Equilibrium K-Means [1.0878040851638]
Centroid-based clustering algorithms have suffered from learning bias towards large clusters.
We propose a new clustering objective function based on the Boltzmann operator, which introduces a novel centroid repulsion mechanism.
The proposed new algorithm, called equilibrium K-means (EKM), is simple, alternating between two steps.
arXiv Detail & Related papers (2024-02-22T12:27:38Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - Push--Pull with Device Sampling [8.344476599818826]
We consider decentralized optimization problems in which a number of agents collaborate to minimize the average of their local functions by exchanging over an underlying communication graph.
We propose an algorithm that combines gradient tracking and variance reduction over the entire network.
Our theoretical analysis shows that the algorithm converges linearly, when the local objective functions are strongly convex.
arXiv Detail & Related papers (2022-06-08T18:18:18Z) - Metrizing Fairness [5.323439381187456]
We study supervised learning problems that have significant effects on individuals from two demographic groups.
We seek predictors that are fair with respect to a group fairness criterion such as statistical parity (SP)
In this paper, we identify conditions under which hard SP constraints are guaranteed to improve predictive accuracy.
arXiv Detail & Related papers (2022-05-30T12:28:10Z) - Efficient CDF Approximations for Normalizing Flows [64.60846767084877]
We build upon the diffeomorphic properties of normalizing flows to estimate the cumulative distribution function (CDF) over a closed region.
Our experiments on popular flow architectures and UCI datasets show a marked improvement in sample efficiency as compared to traditional estimators.
arXiv Detail & Related papers (2022-02-23T06:11:49Z) - KL Guided Domain Adaptation [88.19298405363452]
Domain adaptation is an important problem and often needed for real-world applications.
A common approach in the domain adaptation literature is to learn a representation of the input that has the same distributions over the source and the target domain.
We show that with a probabilistic representation network, the KL term can be estimated efficiently via minibatch samples.
arXiv Detail & Related papers (2021-06-14T22:24:23Z) - Manifold-adaptive dimension estimation revisited [0.0]
We revisit and improve the manifold-adaptive Farahmand-Szepesv'ari-Audibert dimension estimator.
We compute the probability density function of local FSA estimates.
We derive the maximum likelihood formula for global intrinsic dimensionality.
arXiv Detail & Related papers (2020-08-07T15:27:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.