Achieving Model Robustness through Discrete Adversarial Training
- URL: http://arxiv.org/abs/2104.05062v1
- Date: Sun, 11 Apr 2021 17:49:21 GMT
- Title: Achieving Model Robustness through Discrete Adversarial Training
- Authors: Maor Ivgi and Jonathan Berant
- Abstract summary: We leverage discrete adversarial attacks for online augmentation, where adversarial examples are generated at every step.
We find that random sampling leads to impressive gains in robustness, outperforming the commonly-used offline augmentation.
Online augmentation with search-based attacks justifies the higher training cost, significantly improving robustness on three datasets.
- Score: 30.845326360305677
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Discrete adversarial attacks are symbolic perturbations to a language input
that preserve the output label but lead to a prediction error. While such
attacks have been extensively explored for the purpose of evaluating model
robustness, their utility for improving robustness has been limited to offline
augmentation only, i.e., given a trained model, attacks are used to generate
perturbed (adversarial) examples, and the model is re-trained exactly once. In
this work, we address this gap and leverage discrete attacks for online
augmentation, where adversarial examples are generated at every step, adapting
to the changing nature of the model. We also consider efficient attacks based
on random sampling, that unlike prior work are not based on expensive
search-based procedures. As a second contribution, we provide a general
formulation for multiple search-based attacks from past work, and propose a new
attack based on best-first search. Surprisingly, we find that random sampling
leads to impressive gains in robustness, outperforming the commonly-used
offline augmentation, while leading to a speedup at training time of ~10x.
Furthermore, online augmentation with search-based attacks justifies the higher
training cost, significantly improving robustness on three datasets. Last, we
show that our proposed algorithm substantially improves robustness compared to
prior methods.
Related papers
- Improving Adversarial Robustness for 3D Point Cloud Recognition at Test-Time through Purified Self-Training [9.072521170921712]
3D point cloud deep learning model is vulnerable to adversarial attacks.
adversarial purification employs generative model to mitigate the impact of adversarial attacks.
We propose a test-time purified self-training strategy to achieve this objective.
arXiv Detail & Related papers (2024-09-23T11:46:38Z) - Fast Propagation is Better: Accelerating Single-Step Adversarial
Training via Sampling Subnetworks [69.54774045493227]
A drawback of adversarial training is the computational overhead introduced by the generation of adversarial examples.
We propose to exploit the interior building blocks of the model to improve efficiency.
Compared with previous methods, our method not only reduces the training cost but also achieves better model robustness.
arXiv Detail & Related papers (2023-10-24T01:36:20Z) - Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework.
Our importance weights are obtained by optimizing the KL-divergence regularized loss function.
Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z) - Group-based Robustness: A General Framework for Customized Robustness in
the Real World [16.376584375681812]
We find that conventional metrics measuring targeted and untargeted robustness do not appropriately reflect a model's ability to withstand attacks from one set of source classes to another set of target classes.
We propose a new metric, termed group-based robustness, that complements existing metrics and is better-suited for evaluating model performance in certain attack scenarios.
We show that with comparable success rates, finding evasive samples using our new loss functions saves by a factor as large as the number of targeted classes.
arXiv Detail & Related papers (2023-06-29T01:07:12Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Adversarial Fine-tune with Dynamically Regulated Adversary [27.034257769448914]
In many real-world applications such as health diagnosis and autonomous surgical robotics, the standard performance is more valued over model robustness against such extremely malicious attacks.
This work proposes a simple yet effective transfer learning-based adversarial training strategy that disentangles the negative effects of adversarial samples on model's standard performance.
In addition, we introduce a training-friendly adversarial attack algorithm, which facilitates the boost of adversarial robustness without introducing significant training complexity.
arXiv Detail & Related papers (2022-04-28T00:07:15Z) - Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial
Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically.
Our method learns the in adversarial attacks parameterized by a recurrent neural network.
We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z) - Learning to Attack: Towards Textual Adversarial Attacking in Real-world
Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples.
We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z) - Provably robust deep generative models [1.52292571922932]
We propose a method for training provably robust generative models, specifically a provably robust version of the variational auto-encoder (VAE)
We show that it is able to produce generative models that are substantially more robust to adversarial attacks.
arXiv Detail & Related papers (2020-04-22T14:47:41Z) - Temporal Sparse Adversarial Attack on Sequence-based Gait Recognition [56.844587127848854]
We demonstrate that the state-of-the-art gait recognition model is vulnerable to such attacks.
We employ a generative adversarial network based architecture to semantically generate adversarial high-quality gait silhouettes or video frames.
The experimental results show that if only one-fortieth of the frames are attacked, the accuracy of the target model drops dramatically.
arXiv Detail & Related papers (2020-02-22T10:08:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.