The Double-Edged Sword of Input Perturbations to Robust Accurate Fairness
- URL: http://arxiv.org/abs/2404.01356v1
- Date: Mon, 1 Apr 2024 09:29:16 GMT
- Title: The Double-Edged Sword of Input Perturbations to Robust Accurate Fairness
- Authors: Xuran Li, Peng Wu, Yanting Chen, Xingjun Ma, Zhen Zhang, Kaixiang Dong,
- Abstract summary: Deep neural networks (DNNs) are known to be sensitive to adversarial input perturbations.
Informally, robust accurate fairness requires that predictions for an instance consistently align with the ground truth when subjected to input perturbations.
We show that such adversarial instances can be effectively addressed by carefully designed benign perturbations.
- Score: 23.927644024788563
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks (DNNs) are known to be sensitive to adversarial input perturbations, leading to a reduction in either prediction accuracy or individual fairness. To jointly characterize the susceptibility of prediction accuracy and individual fairness to adversarial perturbations, we introduce a novel robustness definition termed robust accurate fairness. Informally, robust accurate fairness requires that predictions for an instance and its similar counterparts consistently align with the ground truth when subjected to input perturbations. We propose an adversarial attack approach dubbed RAFair to expose false or biased adversarial defects in DNN, which either deceive accuracy or compromise individual fairness. Then, we show that such adversarial instances can be effectively addressed by carefully designed benign perturbations, correcting their predictions to be accurate and fair. Our work explores the double-edged sword of input perturbations to robust accurate fairness in DNN and the potential of using benign perturbations to correct adversarial instances.
Related papers
- Rethinking Invariance Regularization in Adversarial Training to Improve Robustness-Accuracy Trade-off [7.202931445597171]
Ad adversarial training has been the state-of-the-art approach to defend against adversarial examples (AEs)
It suffers from a robustness-accuracy trade-off, where high robustness is achieved at the cost of clean accuracy.
Our method significantly improves the robustness-accuracy trade-off by learning adversarially invariant representations without sacrificing discriminative ability.
arXiv Detail & Related papers (2024-02-22T15:53:46Z) - Perturbation-Invariant Adversarial Training for Neural Ranking Models:
Improving the Effectiveness-Robustness Trade-Off [107.35833747750446]
adversarial examples can be crafted by adding imperceptible perturbations to legitimate documents.
This vulnerability raises significant concerns about their reliability and hinders the widespread deployment of NRMs.
In this study, we establish theoretical guarantees regarding the effectiveness-robustness trade-off in NRMs.
arXiv Detail & Related papers (2023-12-16T05:38:39Z) - Counterfactual Fairness for Predictions using Generative Adversarial
Networks [28.65556399421874]
We develop a novel deep neural network called Generative Counterfactual Fairness Network (GCFN) for making predictions under counterfactual fairness.
Our method is mathematically guaranteed to ensure the notion of counterfactual fairness.
arXiv Detail & Related papers (2023-10-26T17:58:39Z) - How adversarial attacks can disrupt seemingly stable accurate classifiers [76.95145661711514]
Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data.
Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data.
We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability.
arXiv Detail & Related papers (2023-09-07T12:02:00Z) - RobustFair: Adversarial Evaluation through Fairness Confusion Directed
Gradient Search [8.278129731168127]
Deep neural networks (DNNs) often face challenges due to their vulnerability to various adversarial perturbations.
This paper introduces a novel approach, RobustFair, to evaluate the accurate fairness of DNNs when subjected to false or biased perturbations.
arXiv Detail & Related papers (2023-05-18T12:07:29Z) - Robustness and Accuracy Could Be Reconcilable by (Proper) Definition [109.62614226793833]
The trade-off between robustness and accuracy has been widely studied in the adversarial literature.
We find that it may stem from the improperly defined robust error, which imposes an inductive bias of local invariance.
By definition, SCORE facilitates the reconciliation between robustness and accuracy, while still handling the worst-case uncertainty.
arXiv Detail & Related papers (2022-02-21T10:36:09Z) - Learning to Predict Trustworthiness with Steep Slope Loss [69.40817968905495]
We study the problem of predicting trustworthiness on real-world large-scale datasets.
We observe that the trustworthiness predictors trained with prior-art loss functions are prone to view both correct predictions and incorrect predictions to be trustworthy.
We propose a novel steep slope loss to separate the features w.r.t. correct predictions from the ones w.r.t. incorrect predictions by two slide-like curves that oppose each other.
arXiv Detail & Related papers (2021-09-30T19:19:09Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z) - Improving Calibration through the Relationship with Adversarial
Robustness [19.384119330332446]
We study the connection between adversarial robustness and calibration.
We propose Adversarial Robustness based Adaptive Labeling (AR-AdaLS)
We find that our method, taking the adversarial robustness of the in-distribution data into consideration, leads to better calibration over the model even under distributional shifts.
arXiv Detail & Related papers (2020-06-29T20:56:33Z) - Proper Network Interpretability Helps Adversarial Robustness in
Classification [91.39031895064223]
We show that with a proper measurement of interpretation, it is difficult to prevent prediction-evasion adversarial attacks from causing interpretation discrepancy.
We develop an interpretability-aware defensive scheme built only on promoting robust interpretation.
We show that our defense achieves both robust classification and robust interpretation, outperforming state-of-the-art adversarial training methods against attacks of large perturbation.
arXiv Detail & Related papers (2020-06-26T01:31:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.