Towards Class-wise Fair Adversarial Training via Anti-Bias Soft Label Distillation
- URL: http://arxiv.org/abs/2506.08611v1
- Date: Tue, 10 Jun 2025 09:20:34 GMT
- Title: Towards Class-wise Fair Adversarial Training via Anti-Bias Soft Label Distillation
- Authors: Shiji Zhao, Chi Chen, Ranjie Duan, Xizhe Wang, Xingxing Wei,
- Abstract summary: Adversarial Training (AT) is widely recognized as an effective approach to enhance the adversarial robustness of Deep Neural Networks.<n>This paper explores the underlying factors of this problem and points out the smoothness degree of soft labels for different classes.<n>We propose Anti-Bias Soft Label Distillation (ABSLD) within the Knowledge Distillation framework to enhance the adversarial robust fairness.
- Score: 16.35331551561344
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Adversarial Training (AT) is widely recognized as an effective approach to enhance the adversarial robustness of Deep Neural Networks. As a variant of AT, Adversarial Robustness Distillation (ARD) has shown outstanding performance in enhancing the robustness of small models. However, both AT and ARD face robust fairness issue: these models tend to display strong adversarial robustness against some classes (easy classes) while demonstrating weak adversarial robustness against others (hard classes). This paper explores the underlying factors of this problem and points out the smoothness degree of soft labels for different classes significantly impacts the robust fairness from both empirical observation and theoretical analysis. Based on the above exploration, we propose Anti-Bias Soft Label Distillation (ABSLD) within the Knowledge Distillation framework to enhance the adversarial robust fairness. Specifically, ABSLD adaptively reduces the student's error risk gap between different classes, which is accomplished by adjusting the class-wise smoothness degree of teacher's soft labels during the training process, and the adjustment is managed by assigning varying temperatures to different classes. Additionally, as a label-based approach, ABSLD is highly adaptable and can be integrated with the sample-based methods. Extensive experiments demonstrate ABSLD outperforms state-of-the-art methods on the comprehensive performance of robustness and fairness.
Related papers
- Towards Robust Federated Learning via Logits Calibration on Non-IID Data [49.286558007937856]
Federated learning (FL) is a privacy-preserving distributed management framework based on collaborative model training of distributed devices in edge networks.
Recent studies have shown that FL is vulnerable to adversarial examples, leading to a significant drop in its performance.
In this work, we adopt the adversarial training (AT) framework to improve the robustness of FL models against adversarial example (AE) attacks.
arXiv Detail & Related papers (2024-03-05T09:18:29Z) - Towards Fairness-Aware Adversarial Learning [13.932705960012846]
We propose a novel learning paradigm, named Fairness-Aware Adversarial Learning (FAAL)
Our method aims to find the worst distribution among different categories, and the solution is guaranteed to obtain the upper bound performance with high probability.
In particular, FAAL can fine-tune an unfair robust model to be fair within only two epochs, without compromising the overall clean and robust accuracies.
arXiv Detail & Related papers (2024-02-27T18:01:59Z) - DAFA: Distance-Aware Fair Adversarial Training [34.94780532071229]
Under adversarial attacks, the majority of the model's predictions for samples from the worst class are biased towards classes similar to the worst class.
We introduce the Distance-Aware Fair Adversarial training (DAFA) methodology, which addresses robust fairness by taking into account the similarities between classes.
arXiv Detail & Related papers (2024-01-23T07:15:47Z) - Improving Adversarial Robust Fairness via Anti-Bias Soft Label Distillation [14.163463596459064]
Adversarial Training (AT) has been widely proved to be an effective method to improve the adversarial robustness against adversarial examples for Deep Neural Networks (DNNs)
As a variant of AT, Adversarial Robustness Distillation (ARD) has demonstrated its superior performance in improving the robustness of small student models.
We propose an Anti-Bias Soft Label Distillation (ABSLD) method to mitigate the adversarial robust fairness problem within the framework of Knowledge Distillation (KD)
arXiv Detail & Related papers (2023-12-09T09:08:03Z) - Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework.
Our importance weights are obtained by optimizing the KL-divergence regularized loss function.
Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z) - Fair Robust Active Learning by Joint Inconsistency [22.150782414035422]
We introduce a novel task, Fair Robust Active Learning (FRAL), integrating conventional FAL and adversarial robustness.
We develop a simple yet effective FRAL strategy by Joint INconsistency (JIN)
Our method exploits the prediction inconsistency between benign and adversarial samples as well as between standard and robust models.
arXiv Detail & Related papers (2022-09-22T01:56:41Z) - Improving Robust Fairness via Balance Adversarial Training [51.67643171193376]
Adversarial training (AT) methods are effective against adversarial attacks, yet they introduce severe disparity of accuracy and robustness between different classes.
We propose Adversarial Training (BAT) to address the robust fairness problem.
arXiv Detail & Related papers (2022-09-15T14:44:48Z) - Revisiting Adversarial Robustness Distillation: Robust Soft Labels Make
Student Better [66.69777970159558]
We propose a novel adversarial robustness distillation method called Robust Soft Label Adversarial Distillation (RSLAD)
RSLAD fully exploits the robust soft labels produced by a robust (adversarially-trained) large teacher model to guide the student's learning.
We empirically demonstrate the effectiveness of our RSLAD approach over existing adversarial training and distillation methods.
arXiv Detail & Related papers (2021-08-18T04:32:35Z) - Analysis and Applications of Class-wise Robustness in Adversarial
Training [92.08430396614273]
Adversarial training is one of the most effective approaches to improve model robustness against adversarial examples.
Previous works mainly focus on the overall robustness of the model, and the in-depth analysis on the role of each class involved in adversarial training is still missing.
We provide a detailed diagnosis of adversarial training on six benchmark datasets, i.e., MNIST, CIFAR-10, CIFAR-100, SVHN, STL-10 and ImageNet.
We observe that the stronger attack methods in adversarial learning achieve performance improvement mainly from a more successful attack on the vulnerable classes.
arXiv Detail & Related papers (2021-05-29T07:28:35Z) - Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness.
We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.