Hard Adversarial Example Mining for Improving Robust Fairness
- URL: http://arxiv.org/abs/2308.01823v1
- Date: Thu, 3 Aug 2023 15:33:24 GMT
- Title: Hard Adversarial Example Mining for Improving Robust Fairness
- Authors: Chenhao Lin, Xiang Ji, Yulong Yang, Qian Li, Chao Shen, Run Wang,
Liming Fang
- Abstract summary: Adversarial training (AT) is widely considered the state-of-the-art technique for improving the robustness of deep neural networks (DNNs) against adversarial examples (AE)
Recent studies have revealed that adversarially trained models are prone to unfairness problems, restricting their applicability.
To alleviate this problem, we propose HAM, a straightforward yet effective framework via adaptive Hard Adversarial example Mining.HAM.
- Score: 18.02943802341582
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial training (AT) is widely considered the state-of-the-art technique
for improving the robustness of deep neural networks (DNNs) against adversarial
examples (AE). Nevertheless, recent studies have revealed that adversarially
trained models are prone to unfairness problems, restricting their
applicability. In this paper, we empirically observe that this limitation may
be attributed to serious adversarial confidence overfitting, i.e., certain
adversarial examples with overconfidence. To alleviate this problem, we propose
HAM, a straightforward yet effective framework via adaptive Hard Adversarial
example Mining.HAM concentrates on mining hard adversarial examples while
discarding the easy ones in an adaptive fashion. Specifically, HAM identifies
hard AEs in terms of their step sizes needed to cross the decision boundary
when calculating loss value. Besides, an early-dropping mechanism is
incorporated to discard the easy examples at the initial stages of AE
generation, resulting in efficient AT. Extensive experimental results on
CIFAR-10, SVHN, and Imagenette demonstrate that HAM achieves significant
improvement in robust fairness while reducing computational cost compared to
several state-of-the-art adversarial training methods. The code will be made
publicly available.
Related papers
- Eliminating Catastrophic Overfitting Via Abnormal Adversarial Examples Regularization [50.43319961935526]
Single-step adversarial training (SSAT) has demonstrated the potential to achieve both efficiency and robustness.
SSAT suffers from catastrophic overfitting (CO), a phenomenon that leads to a severely distorted classifier.
In this work, we observe that some adversarial examples generated on the SSAT-trained network exhibit anomalous behaviour.
arXiv Detail & Related papers (2024-04-11T22:43:44Z) - Improving Adversarial Training using Vulnerability-Aware Perturbation
Budget [7.430861908931903]
Adversarial Training (AT) effectively improves the robustness of Deep Neural Networks (DNNs) to adversarial attacks.
We propose two simple, computationally cheap vulnerability-aware reweighting functions for assigning perturbation bounds to adversarial examples used for AT.
Experimental results show that the proposed methods yield genuine improvements in the robustness of AT algorithms against various adversarial attacks.
arXiv Detail & Related papers (2024-03-06T21:50:52Z) - Perturbation-Invariant Adversarial Training for Neural Ranking Models:
Improving the Effectiveness-Robustness Trade-Off [107.35833747750446]
adversarial examples can be crafted by adding imperceptible perturbations to legitimate documents.
This vulnerability raises significant concerns about their reliability and hinders the widespread deployment of NRMs.
In this study, we establish theoretical guarantees regarding the effectiveness-robustness trade-off in NRMs.
arXiv Detail & Related papers (2023-12-16T05:38:39Z) - AFLOW: Developing Adversarial Examples under Extremely Noise-limited
Settings [7.828994881163805]
deep neural networks (DNNs) are vulnerable to adversarial attacks.
We propose a novel Normalize Flow-based end-to-end attack framework, called AFLOW, to synthesize imperceptible adversarial examples.
Compared with existing methods, AFLOW exhibit superiority in imperceptibility, image quality and attack capability.
arXiv Detail & Related papers (2023-10-15T10:54:07Z) - Reducing Adversarial Training Cost with Gradient Approximation [0.3916094706589679]
We propose a new and efficient adversarial training method, adversarial training with gradient approximation (GAAT) to reduce the cost of building up robust models.
Our proposed method saves up to 60% of the training time with comparable model test accuracy on datasets.
arXiv Detail & Related papers (2023-09-18T03:55:41Z) - Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework.
Our importance weights are obtained by optimizing the KL-divergence regularized loss function.
Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z) - Revisiting and Advancing Adversarial Training Through A Simple Baseline [7.226961695849204]
We introduce a simple baseline approach, termed SimpleAT, that performs competitively with recent methods and mitigates robust overfitting.
We conduct extensive experiments on CIFAR-10/100 and Tiny-ImageNet, which validate the robustness of SimpleAT against state-of-the-art adversarial attackers.
Our results also reveal the connections between SimpleAT and many advanced state-of-the-art adversarial defense methods.
arXiv Detail & Related papers (2023-06-13T08:12:52Z) - Latent Boundary-guided Adversarial Training [61.43040235982727]
Adrial training is proved to be the most effective strategy that injects adversarial examples into model training.
We propose a novel adversarial training framework called LAtent bounDary-guided aDvErsarial tRaining.
arXiv Detail & Related papers (2022-06-08T07:40:55Z) - Towards Speeding up Adversarial Training in Latent Spaces [8.054201249492582]
We propose a novel adversarial training method that does not need to generate real adversarial examples.
We gain a deep insight into the existence of Endogenous Adversarial Examples (EAEs) by the theory of manifold.
Our EAE adversarial training not only shortens the training time, but also enhances the robustness of the model.
arXiv Detail & Related papers (2021-02-01T06:30:32Z) - A Hamiltonian Monte Carlo Method for Probabilistic Adversarial Attack
and Learning [122.49765136434353]
We present an effective method, called Hamiltonian Monte Carlo with Accumulated Momentum (HMCAM), aiming to generate a sequence of adversarial examples.
We also propose a new generative method called Contrastive Adversarial Training (CAT), which approaches equilibrium distribution of adversarial examples.
Both quantitative and qualitative analysis on several natural image datasets and practical systems have confirmed the superiority of the proposed algorithm.
arXiv Detail & Related papers (2020-10-15T16:07:26Z) - Adversarial Distributional Training for Robust Deep Learning [53.300984501078126]
Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples.
Most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks.
In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models.
arXiv Detail & Related papers (2020-02-14T12:36:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.