Algorithmic Bias and Data Bias: Understanding the Relation between
Distributionally Robust Optimization and Data Curation
- URL: http://arxiv.org/abs/2106.09467v1
- Date: Thu, 17 Jun 2021 13:18:03 GMT
- Title: Algorithmic Bias and Data Bias: Understanding the Relation between
Distributionally Robust Optimization and Data Curation
- Authors: Agnieszka S{\l}owik, L\'eon Bottou
- Abstract summary: Machine learning systems based on minimizing average error have been shown to perform inconsistently across notable subsets of the data.
In social and economic applications, where data represent people, this can lead to discrimination underrepresented gender and ethnic groups.
- Score: 1.370633147306388
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning systems based on minimizing average error have been shown to
perform inconsistently across notable subsets of the data, which is not exposed
by a low average error for the entire dataset. In consequential social and
economic applications, where data represent people, this can lead to
discrimination of underrepresented gender and ethnic groups. Given the
importance of bias mitigation in machine learning, the topic leads to
contentious debates on how to ensure fairness in practice (data bias versus
algorithmic bias). Distributionally Robust Optimization (DRO) seemingly
addresses this problem by minimizing the worst expected risk across
subpopulations. We establish theoretical results that clarify the relation
between DRO and the optimization of the same loss averaged on an adequately
weighted training dataset. The results cover finite and infinite number of
training distributions, as well as convex and non-convex loss functions. We
show that neither DRO nor curating the training set should be construed as a
complete solution for bias mitigation: in the same way that there is no
universally robust training set, there is no universal way to setup a DRO
problem and ensure a socially acceptable set of results. We then leverage these
insights to provide a mininal set of practical recommendations for addressing
bias with DRO. Finally, we discuss ramifications of our results in other
related applications of DRO, using an example of adversarial robustness. Our
results show that there is merit to both the algorithm-focused and the
data-focused side of the bias debate, as long as arguments in favor of these
positions are precisely qualified and backed by relevant mathematics known
today.
Related papers
- DRAUC: An Instance-wise Distributionally Robust AUC Optimization
Framework [133.26230331320963]
Area Under the ROC Curve (AUC) is a widely employed metric in long-tailed classification scenarios.
We propose an instance-wise surrogate loss of Distributionally Robust AUC (DRAUC) and build our optimization framework on top of it.
arXiv Detail & Related papers (2023-11-06T12:15:57Z) - Correcting Underrepresentation and Intersectional Bias for Classification [49.1574468325115]
We consider the problem of learning from data corrupted by underrepresentation bias.
We show that with a small amount of unbiased data, we can efficiently estimate the group-wise drop-out rates.
We show that our algorithm permits efficient learning for model classes of finite VC dimension.
arXiv Detail & Related papers (2023-06-19T18:25:44Z) - Learning Antidote Data to Individual Unfairness [23.119278763970037]
Individual fairness is a vital notion to describe fair treatment for individual cases.
Previous studies characterize individual fairness as a prediction-invariant problem.
We show our method resists individual unfairness at a minimal or zero cost to predictive utility.
arXiv Detail & Related papers (2022-11-29T03:32:39Z) - Unbiased Supervised Contrastive Learning [10.728852691100338]
In this work, we tackle the problem of learning representations that are robust to biases.
We first present a margin-based theoretical framework that allows us to clarify why recent contrastive losses can fail when dealing with biased data.
We derive a novel formulation of the supervised contrastive loss (epsilon-SupInfoNCE), providing more accurate control of the minimal distance between positive and negative samples.
Thanks to our theoretical framework, we also propose FairKL, a new debiasing regularization loss, that works well even with extremely biased data.
arXiv Detail & Related papers (2022-11-10T13:44:57Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics.
We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z) - DORO: Distributional and Outlier Robust Optimization [98.44757325531631]
We propose the framework of DORO, for Distributional and Outlier Robust Optimization.
At the core of this approach is a refined risk function which prevents DRO from overfitting to potential outliers.
We theoretically prove the effectiveness of the proposed method, and empirically show that DORO improves the performance and stability of DRO with experiments on large modern datasets.
arXiv Detail & Related papers (2021-06-11T02:59:54Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.