Outlier Robust Adversarial Training
- URL: http://arxiv.org/abs/2309.05145v1
- Date: Sun, 10 Sep 2023 21:36:38 GMT
- Title: Outlier Robust Adversarial Training
- Authors: Shu Hu, Zhenhuan Yang, Xin Wang, Yiming Ying, Siwei Lyu
- Abstract summary: We introduce Outlier Robust Adversarial Training (ORAT) in this work.
ORAT is based on a bi-level optimization formulation of adversarial training with a robust rank-based loss function.
We show that the learning objective of ORAT satisfies the $mathcalH$-consistency in binary classification, which establishes it as a proper surrogate to adversarial 0/1 loss.
- Score: 57.06824365801612
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Supervised learning models are challenged by the intrinsic complexities of
training data such as outliers and minority subpopulations and intentional
attacks at inference time with adversarial samples. While traditional robust
learning methods and the recent adversarial training approaches are designed to
handle each of the two challenges, to date, no work has been done to develop
models that are robust with regard to the low-quality training data and the
potential adversarial attack at inference time simultaneously. It is for this
reason that we introduce Outlier Robust Adversarial Training (ORAT) in this
work. ORAT is based on a bi-level optimization formulation of adversarial
training with a robust rank-based loss function. Theoretically, we show that
the learning objective of ORAT satisfies the $\mathcal{H}$-consistency in
binary classification, which establishes it as a proper surrogate to
adversarial 0/1 loss. Furthermore, we analyze its generalization ability and
provide uniform convergence rates in high probability. ORAT can be optimized
with a simple algorithm. Experimental evaluations on three benchmark datasets
demonstrate the effectiveness and robustness of ORAT in handling outliers and
adversarial attacks. Our code is available at
https://github.com/discovershu/ORAT.
Related papers
- Efficient Adversarial Training in LLMs with Continuous Attacks [99.5882845458567]
Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails.
We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses.
C-AdvIPO is an adversarial variant of IPO that does not require utility data for adversarially robust alignment.
arXiv Detail & Related papers (2024-05-24T14:20:09Z) - Class Incremental Learning for Adversarial Robustness [17.06592851567578]
Adrial training integrates adversarial examples during model training to enhance robustness.
We observe that combining incremental learning with naive adversarial training easily leads to a loss of robustness.
We propose the Flatness Preserving Distillation (FPD) loss that leverages the output difference between adversarial and clean examples.
arXiv Detail & Related papers (2023-12-06T04:38:02Z) - Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework.
Our importance weights are obtained by optimizing the KL-divergence regularized loss function.
Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z) - Adversarial Training Should Be Cast as a Non-Zero-Sum Game [121.95628660889628]
Two-player zero-sum paradigm of adversarial training has not engendered sufficient levels of robustness.
We show that the commonly used surrogate-based relaxation used in adversarial training algorithms voids all guarantees on robustness.
A novel non-zero-sum bilevel formulation of adversarial training yields a framework that matches and in some cases outperforms state-of-the-art attacks.
arXiv Detail & Related papers (2023-06-19T16:00:48Z) - Enhancing Adversarial Training with Feature Separability [52.39305978984573]
We introduce a new concept of adversarial training graph (ATG) with which the proposed adversarial training with feature separability (ATFS) enables to boost the intra-class feature similarity and increase inter-class feature variance.
Through comprehensive experiments, we demonstrate that the proposed ATFS framework significantly improves both clean and robust performance.
arXiv Detail & Related papers (2022-05-02T04:04:23Z) - On the Generalization Properties of Adversarial Training [21.79888306754263]
This paper studies the generalization performance of a generic adversarial training algorithm.
A series of numerical studies are conducted to demonstrate how the smoothness and L1 penalization help improve the adversarial robustness of models.
arXiv Detail & Related papers (2020-08-15T02:32:09Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.