RobustFair: Adversarial Evaluation through Fairness Confusion Directed
Gradient Search
- URL: http://arxiv.org/abs/2305.10906v2
- Date: Sun, 8 Oct 2023 08:39:56 GMT
- Title: RobustFair: Adversarial Evaluation through Fairness Confusion Directed
Gradient Search
- Authors: Xuran Li, Peng Wu, Kaixiang Dong, Zhen Zhang, Yanting Chen
- Abstract summary: Deep neural networks (DNNs) often face challenges due to their vulnerability to various adversarial perturbations.
This paper introduces a novel approach, RobustFair, to evaluate the accurate fairness of DNNs when subjected to false or biased perturbations.
- Score: 8.278129731168127
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks (DNNs) often face challenges due to their vulnerability
to various adversarial perturbations, including false perturbations that
undermine prediction accuracy and biased perturbations that cause biased
predictions for similar inputs. This paper introduces a novel approach,
RobustFair, to evaluate the accurate fairness of DNNs when subjected to these
false or biased perturbations. RobustFair employs the notion of the fairness
confusion matrix induced in accurate fairness to identify the crucial input
features for perturbations. This matrix categorizes predictions as true fair,
true biased, false fair, and false biased, and the perturbations guided by it
can produce a dual impact on instances and their similar counterparts to either
undermine prediction accuracy (robustness) or cause biased predictions
(individual fairness). RobustFair then infers the ground truth of these
generated adversarial instances based on their loss function values
approximated by the total derivative. To leverage the generated instances for
trustworthiness improvement, RobustFair further proposes a data augmentation
strategy to prioritize adversarial instances resembling the original training
set, for data augmentation and model retraining. Notably, RobustFair excels at
detecting intertwined issues of robustness and individual fairness, which are
frequently overlooked in standard robustness and individual fairness
evaluations. This capability empowers RobustFair to enhance both robustness and
individual fairness evaluations by concurrently identifying defects in either
domain. Empirical case studies and quantile regression analyses on benchmark
datasets demonstrate the effectiveness of the fairness confusion matrix guided
perturbation for false or biased adversarial instance generation.
Related papers
- Editable Fairness: Fine-Grained Bias Mitigation in Language Models [52.66450426729818]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.
FAST surpasses state-of-the-art baselines with superior debiasing performance.
This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z) - The Double-Edged Sword of Input Perturbations to Robust Accurate Fairness [23.927644024788563]
Deep neural networks (DNNs) are known to be sensitive to adversarial input perturbations.
Informally, robust accurate fairness requires that predictions for an instance consistently align with the ground truth when subjected to input perturbations.
We show that such adversarial instances can be effectively addressed by carefully designed benign perturbations.
arXiv Detail & Related papers (2024-04-01T09:29:16Z) - Counterfactual Fairness for Predictions using Generative Adversarial
Networks [28.65556399421874]
We develop a novel deep neural network called Generative Counterfactual Fairness Network (GCFN) for making predictions under counterfactual fairness.
Our method is mathematically guaranteed to ensure the notion of counterfactual fairness.
arXiv Detail & Related papers (2023-10-26T17:58:39Z) - Understanding Fairness Surrogate Functions in Algorithmic Fairness [21.555040357521907]
We show that there is a surrogate-fairness gap between the fairness definition and the fairness surrogate function.
We elaborate a novel and general algorithm called Balanced Surrogate, which iteratively reduces the gap to mitigate unfairness.
arXiv Detail & Related papers (2023-10-17T12:40:53Z) - Learning for Counterfactual Fairness from Observational Data [62.43249746968616]
Fairness-aware machine learning aims to eliminate biases of learning models against certain subgroups described by certain protected (sensitive) attributes such as race, gender, and age.
A prerequisite for existing methods to achieve counterfactual fairness is the prior human knowledge of the causal model for the data.
In this work, we address the problem of counterfactually fair prediction from observational data without given causal models by proposing a novel framework CLAIRE.
arXiv Detail & Related papers (2023-07-17T04:08:29Z) - Chasing Fairness Under Distribution Shift: A Model Weight Perturbation
Approach [72.19525160912943]
We first theoretically demonstrate the inherent connection between distribution shift, data perturbation, and model weight perturbation.
We then analyze the sufficient conditions to guarantee fairness for the target dataset.
Motivated by these sufficient conditions, we propose robust fairness regularization (RFR)
arXiv Detail & Related papers (2023-03-06T17:19:23Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - Increasing Fairness in Predictions Using Bias Parity Score Based Loss
Function Regularization [0.8594140167290099]
We introduce a family of fairness enhancing regularization components that we use in conjunction with the traditional binary-cross-entropy based accuracy loss.
We deploy them in the context of a recidivism prediction task as well as on a census-based adult income dataset.
arXiv Detail & Related papers (2021-11-05T17:42:33Z) - Learning to Predict Trustworthiness with Steep Slope Loss [69.40817968905495]
We study the problem of predicting trustworthiness on real-world large-scale datasets.
We observe that the trustworthiness predictors trained with prior-art loss functions are prone to view both correct predictions and incorrect predictions to be trustworthy.
We propose a novel steep slope loss to separate the features w.r.t. correct predictions from the ones w.r.t. incorrect predictions by two slide-like curves that oppose each other.
arXiv Detail & Related papers (2021-09-30T19:19:09Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.