A Prompting-based Approach for Adversarial Example Generation and
Robustness Enhancement
- URL: http://arxiv.org/abs/2203.10714v1
- Date: Mon, 21 Mar 2022 03:21:32 GMT
- Title: A Prompting-based Approach for Adversarial Example Generation and
Robustness Enhancement
- Authors: Yuting Yang, Pei Huang, Juan Cao, Jintao Li, Yun Lin, Jin Song Dong,
Feifei Ma, Jian Zhang
- Abstract summary: We propose a novel prompt-based adversarial attack to compromise NLP models.
We generate adversarial examples via mask-and-filling under the effect of a malicious purpose.
Our training method does not actually generate adversarial samples, it can be applied to large-scale training sets efficiently.
- Score: 18.532308729844598
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent years have seen the wide application of NLP models in crucial areas
such as finance, medical treatment, and news media, raising concerns of the
model robustness and vulnerabilities. In this paper, we propose a novel
prompt-based adversarial attack to compromise NLP models and robustness
enhancement technique. We first construct malicious prompts for each instance
and generate adversarial examples via mask-and-filling under the effect of a
malicious purpose. Our attack technique targets the inherent vulnerabilities of
NLP models, allowing us to generate samples even without interacting with the
victim NLP model, as long as it is based on pre-trained language models (PLMs).
Furthermore, we design a prompt-based adversarial training method to improve
the robustness of PLMs. As our training method does not actually generate
adversarial samples, it can be applied to large-scale training sets
efficiently. The experimental results show that our attack method can achieve a
high attack success rate with more diverse, fluent and natural adversarial
examples. In addition, our robustness enhancement method can significantly
improve the robustness of models to resist adversarial attacks. Our work
indicates that prompting paradigm has great potential in probing some
fundamental flaws of PLMs and fine-tuning them for downstream tasks.
Related papers
- Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - Efficient Adversarial Training in LLMs with Continuous Attacks [99.5882845458567]
Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails.
We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses.
C-AdvIPO is an adversarial variant of IPO that does not require utility data for adversarially robust alignment.
arXiv Detail & Related papers (2024-05-24T14:20:09Z) - Adversarial Attacks and Defense for Conversation Entailment Task [0.49157446832511503]
Large language models are vulnerable to low-cost adversarial attacks.
We fine-tune a transformer model to accurately discern the truthfulness of hypotheses.
We introduce an embedding perturbation loss method to bolster the model's robustness.
arXiv Detail & Related papers (2024-05-01T02:49:18Z) - Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features.
backdoor attacks subtly embed malicious behaviors within the model during training.
We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z) - COVER: A Heuristic Greedy Adversarial Attack on Prompt-based Learning in
Language Models [4.776465250559034]
We propose a prompt-based adversarial attack on manual templates in black box scenarios.
First of all, we design character-level and word-level approaches to break manual templates separately.
And we present a greedy algorithm for the attack based on the above destructive approaches.
arXiv Detail & Related papers (2023-06-09T03:53:42Z) - Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial
Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically.
Our method learns the in adversarial attacks parameterized by a recurrent neural network.
We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z) - Improving Gradient-based Adversarial Training for Text Classification by
Contrastive Learning and Auto-Encoder [18.375585982984845]
We focus on enhancing the model's ability to defend gradient-based adversarial attack during the model's training process.
We propose two novel adversarial training approaches: CARL and RAR.
Experiments show that the proposed two approaches outperform strong baselines on various text classification datasets.
arXiv Detail & Related papers (2021-09-14T09:08:58Z) - Improving White-box Robustness of Pre-processing Defenses via Joint Adversarial Training [106.34722726264522]
A range of adversarial defense techniques have been proposed to mitigate the interference of adversarial noise.
Pre-processing methods may suffer from the robustness degradation effect.
A potential cause of this negative effect is that adversarial training examples are static and independent to the pre-processing model.
We propose a method called Joint Adversarial Training based Pre-processing (JATP) defense.
arXiv Detail & Related papers (2021-06-10T01:45:32Z) - Self-Progressing Robust Training [146.8337017922058]
Current robust training methods such as adversarial training explicitly uses an "attack" to generate adversarial examples.
We propose a new framework called SPROUT, self-progressing robust training.
Our results shed new light on scalable, effective and attack-independent robust training methods.
arXiv Detail & Related papers (2020-12-22T00:45:24Z) - Provably robust deep generative models [1.52292571922932]
We propose a method for training provably robust generative models, specifically a provably robust version of the variational auto-encoder (VAE)
We show that it is able to produce generative models that are substantially more robust to adversarial attacks.
arXiv Detail & Related papers (2020-04-22T14:47:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.