To be Robust or to be Fair: Towards Fairness in Adversarial Training
- URL: http://arxiv.org/abs/2010.06121v2
- Date: Tue, 18 May 2021 23:32:55 GMT
- Title: To be Robust or to be Fair: Towards Fairness in Adversarial Training
- Authors: Han Xu, Xiaorui Liu, Yaxin Li, Anil K. Jain, Jiliang Tang
- Abstract summary: We find that adversarial training algorithms tend to introduce severe disparity of accuracy and robustness between different groups of data.
We propose a Fair-Robust-Learning (FRL) framework to mitigate this unfairness problem when doing adversarial defenses.
- Score: 83.42241071662897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial training algorithms have been proved to be reliable to improve
machine learning models' robustness against adversarial examples. However, we
find that adversarial training algorithms tend to introduce severe disparity of
accuracy and robustness between different groups of data. For instance, a PGD
adversarially trained ResNet18 model on CIFAR-10 has 93% clean accuracy and 67%
PGD l-infty-8 robust accuracy on the class "automobile" but only 65% and 17% on
the class "cat". This phenomenon happens in balanced datasets and does not
exist in naturally trained models when only using clean samples. In this work,
we empirically and theoretically show that this phenomenon can happen under
general adversarial training algorithms which minimize DNN models' robust
errors. Motivated by these findings, we propose a Fair-Robust-Learning (FRL)
framework to mitigate this unfairness problem when doing adversarial defenses.
Experimental results validate the effectiveness of FRL.
Related papers
- MeanSparse: Post-Training Robustness Enhancement Through Mean-Centered Feature Sparsification [32.70084821901212]
MeanSparse is a method to improve the robustness of Convolutional and attention-based Neural Networks against adversarial examples.
Our experiments show that MeanSparse achieves a new robustness record of 75.28%.
arXiv Detail & Related papers (2024-06-09T22:14:55Z) - Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness [52.9493817508055]
We propose Pre-trained Model Guided Adversarial Fine-Tuning (PMG-AFT) to enhance the model's zero-shot adversarial robustness.
Our approach consistently improves clean accuracy by an average of 8.72%.
arXiv Detail & Related papers (2024-01-09T04:33:03Z) - Perturbation-Invariant Adversarial Training for Neural Ranking Models:
Improving the Effectiveness-Robustness Trade-Off [107.35833747750446]
adversarial examples can be crafted by adding imperceptible perturbations to legitimate documents.
This vulnerability raises significant concerns about their reliability and hinders the widespread deployment of NRMs.
In this study, we establish theoretical guarantees regarding the effectiveness-robustness trade-off in NRMs.
arXiv Detail & Related papers (2023-12-16T05:38:39Z) - Improving Robust Fairness via Balance Adversarial Training [51.67643171193376]
Adversarial training (AT) methods are effective against adversarial attacks, yet they introduce severe disparity of accuracy and robustness between different classes.
We propose Adversarial Training (BAT) to address the robust fairness problem.
arXiv Detail & Related papers (2022-09-15T14:44:48Z) - Two Heads are Better than One: Robust Learning Meets Multi-branch Models [14.72099568017039]
We propose Branch Orthogonality adveRsarial Training (BORT) to obtain state-of-the-art performance with solely the original dataset for adversarial training.
We evaluate our approach on CIFAR-10, CIFAR-100, and SVHN against ell_infty norm-bounded perturbations of size epsilon = 8/255, respectively.
arXiv Detail & Related papers (2022-08-17T05:42:59Z) - Efficient Adversarial Training With Data Pruning [26.842714298874192]
We show that data pruning leads to improvements in convergence and reliability of adversarial training.
In some settings data pruning brings benefits from both worlds-it both improves adversarial accuracy and training time.
arXiv Detail & Related papers (2022-07-01T23:54:46Z) - FairIF: Boosting Fairness in Deep Learning via Influence Functions with
Validation Set Sensitive Attributes [51.02407217197623]
We propose a two-stage training algorithm named FAIRIF.
It minimizes the loss over the reweighted data set where the sample weights are computed.
We show that FAIRIF yields models with better fairness-utility trade-offs against various types of bias.
arXiv Detail & Related papers (2022-01-15T05:14:48Z) - Deep Repulsive Prototypes for Adversarial Robustness [3.351714665243138]
We propose to train models on output spaces with large class separation in order to gain robustness without adversarial training.
We introduce a method to partition the output space into class prototypes with large separation and train models to preserve it.
Experimental results show that models trained with these prototypes gain competitive robustness with adversarial training.
arXiv Detail & Related papers (2021-05-26T09:30:28Z) - Adversarial Feature Stacking for Accurate and Robust Predictions [4.208059346198116]
Adversarial Feature Stacking (AFS) model can jointly take advantage of features with varied levels of robustness and accuracy.
We evaluate the AFS model on CIFAR-10 and CIFAR-100 datasets with strong adaptive attack methods.
arXiv Detail & Related papers (2021-03-24T12:01:24Z) - Overfitting in adversarially robust deep learning [86.11788847990783]
We show that overfitting to the training set does in fact harm robust performance to a very large degree in adversarially robust training.
We also show that effects such as the double descent curve do still occur in adversarially trained models, yet fail to explain the observed overfitting.
arXiv Detail & Related papers (2020-02-26T15:40:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.