TRIX- Trading Adversarial Fairness via Mixed Adversarial Training
- URL: http://arxiv.org/abs/2507.07768v1
- Date: Thu, 10 Jul 2025 13:53:52 GMT
- Title: TRIX- Trading Adversarial Fairness via Mixed Adversarial Training
- Authors: Tejaswini Medi, Steffen Jung, Margret Keuper,
- Abstract summary: Adversarial Training (AT) is a widely adopted defense against adversarial examples.<n>Existing approaches typically apply a uniform training objective across all classes, overlooking disparities in class-wise vulnerability.<n>We introduce TRIX, a feature-aware adversarial training framework that adaptively assigns weaker targeted adversaries to strong classes.
- Score: 16.10247754923311
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Adversarial Training (AT) is a widely adopted defense against adversarial examples. However, existing approaches typically apply a uniform training objective across all classes, overlooking disparities in class-wise vulnerability. This results in adversarial unfairness: classes with well distinguishable features (strong classes) tend to become more robust, while classes with overlapping or shared features(weak classes) remain disproportionately susceptible to adversarial attacks. We observe that strong classes do not require strong adversaries during training, as their non-robust features are quickly suppressed. In contrast, weak classes benefit from stronger adversaries to effectively reduce their vulnerabilities. Motivated by this, we introduce TRIX, a feature-aware adversarial training framework that adaptively assigns weaker targeted adversaries to strong classes, promoting feature diversity via uniformly sampled targets, and stronger untargeted adversaries to weak classes, enhancing their focused robustness. TRIX further incorporates per-class loss weighting and perturbation strength adjustments, building on prior work, to emphasize weak classes during the optimization. Comprehensive experiments on standard image classification benchmarks, including evaluations under strong attacks such as PGD and AutoAttack, demonstrate that TRIX significantly improves worst-case class accuracy on both clean and adversarial data, reducing inter-class robustness disparities, and preserves overall accuracy. Our results highlight TRIX as a practical step toward fair and effective adversarial defense.
Related papers
- FAIR-TAT: Improving Model Fairness Using Targeted Adversarial Training [16.10247754923311]
We introduce a novel approach called Fair Targeted Adversarial Training (FAIR-TAT)<n>We show that using targeted adversarial attacks for adversarial training (instead of untargeted attacks) can allow for more favorable trade-offs with respect to adversarial fairness.
arXiv Detail & Related papers (2024-10-30T15:58:03Z) - DAFA: Distance-Aware Fair Adversarial Training [34.94780532071229]
Under adversarial attacks, the majority of the model's predictions for samples from the worst class are biased towards classes similar to the worst class.
We introduce the Distance-Aware Fair Adversarial training (DAFA) methodology, which addresses robust fairness by taking into account the similarities between classes.
arXiv Detail & Related papers (2024-01-23T07:15:47Z) - Improving Adversarial Robustness with Self-Paced Hard-Class Pair
Reweighting [5.084323778393556]
adversarial training with untargeted attacks is one of the most recognized methods.
We find that the naturally imbalanced inter-class semantic similarity makes those hard-class pairs to become the virtual targets of each other.
We propose to upweight hard-class pair loss in model optimization, which prompts learning discriminative features from hard classes.
arXiv Detail & Related papers (2022-10-26T22:51:36Z) - Improving Robust Fairness via Balance Adversarial Training [51.67643171193376]
Adversarial training (AT) methods are effective against adversarial attacks, yet they introduce severe disparity of accuracy and robustness between different classes.
We propose Adversarial Training (BAT) to address the robust fairness problem.
arXiv Detail & Related papers (2022-09-15T14:44:48Z) - Enhancing Adversarial Training with Feature Separability [52.39305978984573]
We introduce a new concept of adversarial training graph (ATG) with which the proposed adversarial training with feature separability (ATFS) enables to boost the intra-class feature similarity and increase inter-class feature variance.
Through comprehensive experiments, we demonstrate that the proposed ATFS framework significantly improves both clean and robust performance.
arXiv Detail & Related papers (2022-05-02T04:04:23Z) - Analysis and Applications of Class-wise Robustness in Adversarial
Training [92.08430396614273]
Adversarial training is one of the most effective approaches to improve model robustness against adversarial examples.
Previous works mainly focus on the overall robustness of the model, and the in-depth analysis on the role of each class involved in adversarial training is still missing.
We provide a detailed diagnosis of adversarial training on six benchmark datasets, i.e., MNIST, CIFAR-10, CIFAR-100, SVHN, STL-10 and ImageNet.
We observe that the stronger attack methods in adversarial learning achieve performance improvement mainly from a more successful attack on the vulnerable classes.
arXiv Detail & Related papers (2021-05-29T07:28:35Z) - Universal Adversarial Training with Class-Wise Perturbations [78.05383266222285]
adversarial training is the most widely used method for defending against adversarial attacks.
In this work, we find that a UAP does not attack all classes equally.
We improve the SOTA UAT by proposing to utilize class-wise UAPs during adversarial training.
arXiv Detail & Related papers (2021-04-07T09:05:49Z) - Robustness May Be at Odds with Fairness: An Empirical Study on
Class-wise Accuracy [85.20742045853738]
CNNs are widely known to be vulnerable to adversarial attacks.
We propose an empirical study on the class-wise accuracy and robustness of adversarially trained models.
We find that there exists inter-class discrepancy for accuracy and robustness even when the training dataset has an equal number of samples for each class.
arXiv Detail & Related papers (2020-10-26T06:32:32Z) - Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness.
We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.