Related papers: Achieving Model Robustness through Discrete Adversarial Training

Achieving Model Robustness through Discrete Adversarial Training

URL: http://arxiv.org/abs/2104.05062v1
Date: Sun, 11 Apr 2021 17:49:21 GMT
Title: Achieving Model Robustness through Discrete Adversarial Training
Authors: Maor Ivgi and Jonathan Berant
Abstract summary: We leverage discrete adversarial attacks for online augmentation, where adversarial examples are generated at every step. We find that random sampling leads to impressive gains in robustness, outperforming the commonly-used offline augmentation. Online augmentation with search-based attacks justifies the higher training cost, significantly improving robustness on three datasets.
Score: 30.845326360305677
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Discrete adversarial attacks are symbolic perturbations to a language input that preserve the output label but lead to a prediction error. While such attacks have been extensively explored for the purpose of evaluating model robustness, their utility for improving robustness has been limited to offline augmentation only, i.e., given a trained model, attacks are used to generate perturbed (adversarial) examples, and the model is re-trained exactly once. In this work, we address this gap and leverage discrete attacks for online augmentation, where adversarial examples are generated at every step, adapting to the changing nature of the model. We also consider efficient attacks based on random sampling, that unlike prior work are not based on expensive search-based procedures. As a second contribution, we provide a general formulation for multiple search-based attacks from past work, and propose a new attack based on best-first search. Surprisingly, we find that random sampling leads to impressive gains in robustness, outperforming the commonly-used offline augmentation, while leading to a speedup at training time of ~10x. Furthermore, online augmentation with search-based attacks justifies the higher training cost, significantly improving robustness on three datasets. Last, we show that our proposed algorithm substantially improves robustness compared to prior methods.

Related papers

Concealed Adversarial attacks on neural networks for sequential data [2.1879059908547482]
We develop a concealed adversarial attack for different time-series models. It provides more realistic perturbations, being hard to detect by a human or model discriminator. Our findings highlight the growing challenge of designing robust time series models.
arXiv Detail & Related papers (2025-02-28T11:03:32Z)
Adapting to Evolving Adversaries with Regularized Continual Robust Training [47.93633573641843]
We present theoretical results which show that the gap in a model's robustness against different attacks is bounded by how far each attack perturbs a sample in the model's logit space. Our findings and open-source code lay the groundwork for the deployment of models robust to evolving attacks.
arXiv Detail & Related papers (2025-02-06T17:38:41Z)
Improving Adversarial Robustness for 3D Point Cloud Recognition at Test-Time through Purified Self-Training [9.072521170921712]
3D point cloud deep learning model is vulnerable to adversarial attacks. adversarial purification employs generative model to mitigate the impact of adversarial attacks. We propose a test-time purified self-training strategy to achieve this objective.
arXiv Detail & Related papers (2024-09-23T11:46:38Z)
Fast Propagation is Better: Accelerating Single-Step Adversarial Training via Sampling Subnetworks [69.54774045493227]
A drawback of adversarial training is the computational overhead introduced by the generation of adversarial examples. We propose to exploit the interior building blocks of the model to improve efficiency. Compared with previous methods, our method not only reduces the training cost but also achieves better model robustness.
arXiv Detail & Related papers (2023-10-24T01:36:20Z)
Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework. Our importance weights are obtained by optimizing the KL-divergence regularized loss function. Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z)
Group-based Robustness: A General Framework for Customized Robustness in the Real World [16.376584375681812]
We find that conventional metrics measuring targeted and untargeted robustness do not appropriately reflect a model's ability to withstand attacks from one set of source classes to another set of target classes. We propose a new metric, termed group-based robustness, that complements existing metrics and is better-suited for evaluating model performance in certain attack scenarios. We show that with comparable success rates, finding evasive samples using our new loss functions saves by a factor as large as the number of targeted classes.
arXiv Detail & Related papers (2023-06-29T01:07:12Z)
TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks. We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework. TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z)
Adversarial Fine-tune with Dynamically Regulated Adversary [27.034257769448914]
In many real-world applications such as health diagnosis and autonomous surgical robotics, the standard performance is more valued over model robustness against such extremely malicious attacks. This work proposes a simple yet effective transfer learning-based adversarial training strategy that disentangles the negative effects of adversarial samples on model's standard performance. In addition, we introduce a training-friendly adversarial attack algorithm, which facilitates the boost of adversarial robustness without introducing significant training complexity.
arXiv Detail & Related papers (2022-04-28T00:07:15Z)
Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically. Our method learns the in adversarial attacks parameterized by a recurrent neural network. We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z)
Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples. We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z)
Provably robust deep generative models [1.52292571922932]
We propose a method for training provably robust generative models, specifically a provably robust version of the variational auto-encoder (VAE) We show that it is able to produce generative models that are substantially more robust to adversarial attacks.
arXiv Detail & Related papers (2020-04-22T14:47:41Z)
Temporal Sparse Adversarial Attack on Sequence-based Gait Recognition [56.844587127848854]
We demonstrate that the state-of-the-art gait recognition model is vulnerable to such attacks. We employ a generative adversarial network based architecture to semantically generate adversarial high-quality gait silhouettes or video frames. The experimental results show that if only one-fortieth of the frames are attacked, the accuracy of the target model drops dramatically.
arXiv Detail & Related papers (2020-02-22T10:08:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.