Related papers: Sy-FAR: Symmetry-based Fair Adversarial Robustness

Sy-FAR: Symmetry-based Fair Adversarial Robustness

URL: http://arxiv.org/abs/2509.12939v1
Date: Tue, 16 Sep 2025 10:39:42 GMT
Title: Sy-FAR: Symmetry-based Fair Adversarial Robustness
Authors: Haneen Najjar, Eyal Ronen, Mahmood Sharif,
Abstract summary: Security-critical machine-learning (ML) systems are susceptible to adversarial examples.<n>It is often easier to attack from certain classes or groups than from others.<n>We develop Sy-FAR, a technique to encourage symmetry while also optimizing adversarial robustness.
Score: 12.306208202393092
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Security-critical machine-learning (ML) systems, such as face-recognition systems, are susceptible to adversarial examples, including real-world physically realizable attacks. Various means to boost ML's adversarial robustness have been proposed; however, they typically induce unfair robustness: It is often easier to attack from certain classes or groups than from others. Several techniques have been developed to improve adversarial robustness while seeking perfect fairness between classes. Yet, prior work has focused on settings where security and fairness are less critical. Our insight is that achieving perfect parity in realistic fairness-critical tasks, such as face recognition, is often infeasible -- some classes may be highly similar, leading to more misclassifications between them. Instead, we suggest that seeking symmetry -- i.e., attacks from class $i$ to $j$ would be as successful as from $j$ to $i$ -- is more tractable. Intuitively, symmetry is a desirable because class resemblance is a symmetric relation in most domains. Additionally, as we prove theoretically, symmetry between individuals induces symmetry between any set of sub-groups, in contrast to other fairness notions where group-fairness is often elusive. We develop Sy-FAR, a technique to encourage symmetry while also optimizing adversarial robustness and extensively evaluate it using five datasets, with three model architectures, including against targeted and untargeted realistic attacks. The results show Sy-FAR significantly improves fair adversarial robustness compared to state-of-the-art methods. Moreover, we find that Sy-FAR is faster and more consistent across runs. Notably, Sy-FAR also ameliorates another type of unfairness we discover in this work -- target classes that adversarial examples are likely to be classified into become significantly less vulnerable after inducing symmetry.

Related papers

Adversarial Bias: Data Poisoning Attacks on Fairness [48.17618627431355]
There is relatively little research on how an AI system's fairness can be intentionally compromised.<n>In this work, we provide a theoretical analysis demonstrating that a simple adversarial poisoning strategy is sufficient to induce maximally unfair behavior.<n>Our attack significantly outperforms existing methods in degrading fairness metrics across multiple models and datasets.
arXiv Detail & Related papers (2025-11-11T15:09:53Z)
Preference Poisoning Attacks on Reward Model Learning [47.00395978031771]
We investigate the nature and extent of a vulnerability in learning reward models from pairwise comparisons. We propose two classes of algorithmic approaches for these attacks: a gradient-based framework, and several variants of rank-by-distance methods. We find that the best attacks are often highly successful, achieving in the most extreme case 100% success rate with only 0.3% of the data poisoned.
arXiv Detail & Related papers (2024-02-02T21:45:24Z)
DAFA: Distance-Aware Fair Adversarial Training [34.94780532071229]
Under adversarial attacks, the majority of the model's predictions for samples from the worst class are biased towards classes similar to the worst class. We introduce the Distance-Aware Fair Adversarial training (DAFA) methodology, which addresses robust fairness by taking into account the similarities between classes.
arXiv Detail & Related papers (2024-01-23T07:15:47Z)
Enhancing Robust Representation in Adversarial Training: Alignment and Exclusion Criteria [61.048842737581865]
We show that Adversarial Training (AT) omits to learning robust features, resulting in poor performance of adversarial robustness. We propose a generic framework of AT to gain robust representation, by the asymmetric negative contrast and reverse attention. Empirical evaluations on three benchmark datasets show our methods greatly advance the robustness of AT and achieve state-of-the-art performance.
arXiv Detail & Related papers (2023-10-05T07:29:29Z)
Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework. Our importance weights are obtained by optimizing the KL-divergence regularized loss function. Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z)
Improving Adversarial Robustness with Self-Paced Hard-Class Pair Reweighting [5.084323778393556]
adversarial training with untargeted attacks is one of the most recognized methods. We find that the naturally imbalanced inter-class semantic similarity makes those hard-class pairs to become the virtual targets of each other. We propose to upweight hard-class pair loss in model optimization, which prompts learning discriminative features from hard classes.
arXiv Detail & Related papers (2022-10-26T22:51:36Z)
Resisting Adversarial Attacks in Deep Neural Networks using Diverse Decision Boundaries [12.312877365123267]
Deep learning systems are vulnerable to crafted adversarial examples, which may be imperceptible to the human eye, but can lead the model to misclassify. We develop a new ensemble-based solution that constructs defender models with diverse decision boundaries with respect to the original model. We present extensive experimentations using standard image classification datasets, namely MNIST, CIFAR-10 and CIFAR-100 against state-of-the-art adversarial attacks.
arXiv Detail & Related papers (2022-08-18T08:19:26Z)
Adversarial Contrastive Learning via Asymmetric InfoNCE [64.42740292752069]
We propose to treat adversarial samples unequally when contrasted with an asymmetric InfoNCE objective. In the asymmetric fashion, the adverse impacts of conflicting objectives between CL and adversarial learning can be effectively mitigated. Experiments show that our approach consistently outperforms existing Adversarial CL methods.
arXiv Detail & Related papers (2022-07-18T04:14:36Z)
Analysis and Applications of Class-wise Robustness in Adversarial Training [92.08430396614273]
Adversarial training is one of the most effective approaches to improve model robustness against adversarial examples. Previous works mainly focus on the overall robustness of the model, and the in-depth analysis on the role of each class involved in adversarial training is still missing. We provide a detailed diagnosis of adversarial training on six benchmark datasets, i.e., MNIST, CIFAR-10, CIFAR-100, SVHN, STL-10 and ImageNet. We observe that the stronger attack methods in adversarial learning achieve performance improvement mainly from a more successful attack on the vulnerable classes.
arXiv Detail & Related papers (2021-05-29T07:28:35Z)
Perceptually Constrained Adversarial Attacks [2.0305676256390934]
We replace the usually applied $L_p$ norms with the structural similarity index (SSIM) measure. Our SSIM-constrained adversarial attacks can break state-of-the-art adversarially trained classifiers and achieve similar or larger success rate than the elastic net attack. We evaluate the performance of several defense schemes in a perceptually much more meaningful way than was done previously in the literature.
arXiv Detail & Related papers (2021-02-14T12:28:51Z)
Block Switching: A Stochastic Approach for Deep Learning Security [75.92824098268471]
Recent study of adversarial attacks has revealed the vulnerability of modern deep learning models. In this paper, we introduce Block Switching (BS), a defense strategy against adversarial attacks based on onity.
arXiv Detail & Related papers (2020-02-18T23:14:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.