MockingBERT: A Method for Retroactively Adding Resilience to NLP Models
- URL: http://arxiv.org/abs/2208.09915v1
- Date: Sun, 21 Aug 2022 16:02:01 GMT
- Title: MockingBERT: A Method for Retroactively Adding Resilience to NLP Models
- Authors: Jan Jezabek and Akash Singh
- Abstract summary: We propose a novel method of retroactively adding resilience to misspellings to transformer-based NLP models.
This can be achieved without the need for re-training of the original NLP model.
We also propose a new efficient approximate method of generating adversarial misspellings.
- Score: 4.584774276587428
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Protecting NLP models against misspellings whether accidental or adversarial
has been the object of research interest for the past few years. Existing
remediations have typically either compromised accuracy or required full model
re-training with each new class of attacks. We propose a novel method of
retroactively adding resilience to misspellings to transformer-based NLP
models. This robustness can be achieved without the need for re-training of the
original NLP model and with only a minimal loss of language understanding
performance on inputs without misspellings. Additionally we propose a new
efficient approximate method of generating adversarial misspellings, which
significantly reduces the cost needed to evaluate a model's resilience to
adversarial attacks.
Related papers
- PromptFix: Few-shot Backdoor Removal via Adversarial Prompt Tuning [28.845915332201592]
Pre-trained language models (PLMs) have attracted enormous attention over the past few years with their unparalleled performances.
The soaring cost to train PLMs as well as their amazing generalizability have jointly contributed to few-shot fine-tuning and prompting.
Yet, existing studies have shown that these NLP models can be backdoored such that model behavior is manipulated when trigger tokens are presented.
We propose PromptFix, a novel backdoor mitigation strategy for NLP models via adversarial prompt-tuning in few-shot settings.
arXiv Detail & Related papers (2024-06-06T20:06:42Z) - Feature Separation and Recalibration for Adversarial Robustness [18.975320671203132]
We propose a novel, easy-to- verify approach named Feature Separation and Recalibration.
It recalibrates the malicious, non-robust activations for more robust feature maps through Separation and Recalibration.
It improves the robustness of existing adversarial training methods by up to 8.57% with small computational overhead.
arXiv Detail & Related papers (2023-03-24T07:43:57Z) - Model-tuning Via Prompts Makes NLP Models Adversarially Robust [97.02353907677703]
We show surprising gains in adversarial robustness enjoyed by Model-tuning Via Prompts (MVP)
MVP improves performance against adversarial substitutions by an average of 8% over standard methods.
We also conduct ablations to investigate the mechanism underlying these gains.
arXiv Detail & Related papers (2023-03-13T17:41:57Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - A Prompting-based Approach for Adversarial Example Generation and
Robustness Enhancement [18.532308729844598]
We propose a novel prompt-based adversarial attack to compromise NLP models.
We generate adversarial examples via mask-and-filling under the effect of a malicious purpose.
Our training method does not actually generate adversarial samples, it can be applied to large-scale training sets efficiently.
arXiv Detail & Related papers (2022-03-21T03:21:32Z) - Measuring and Reducing Model Update Regression in Structured Prediction
for NLP [31.86240946966003]
backward compatibility requires that the new model does not regress on cases that were correctly handled by its predecessor.
This work studies model update regression in structured prediction tasks.
We propose a simple and effective method, Backward-Congruent Re-ranking (BCR), by taking into account the characteristics of structured output.
arXiv Detail & Related papers (2022-02-07T07:04:54Z) - How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial
Robustness? [121.57551065856164]
We propose Robust Informative Fine-Tuning (RIFT) as a novel adversarial fine-tuning method from an information-theoretical perspective.
RIFT encourages an objective model to retain the features learned from the pre-trained model throughout the entire fine-tuning process.
Experimental results show that RIFT consistently outperforms the state-of-the-arts on two popular NLP tasks.
arXiv Detail & Related papers (2021-12-22T05:04:41Z) - NoiER: An Approach for Training more Reliable Fine-TunedDownstream Task
Models [54.184609286094044]
We propose noise entropy regularisation (NoiER) as an efficient learning paradigm that solves the problem without auxiliary models and additional data.
The proposed approach improved traditional OOD detection evaluation metrics by 55% on average compared to the original fine-tuned models.
arXiv Detail & Related papers (2021-08-29T06:58:28Z) - Cold-start Active Learning through Self-supervised Language Modeling [15.551710499866239]
Active learning aims to reduce annotation costs by choosing the most critical examples to label.
With BERT, we develop a simple strategy based on the masked language modeling loss.
Compared to other baselines, our approach reaches higher accuracy within less sampling iterations and time.
arXiv Detail & Related papers (2020-10-19T14:09:17Z) - Adversarial Attack and Defense of Structured Prediction Models [58.49290114755019]
In this paper, we investigate attacks and defenses for structured prediction tasks in NLP.
The structured output of structured prediction models is sensitive to small perturbations in the input.
We propose a novel and unified framework that learns to attack a structured prediction model using a sequence-to-sequence model.
arXiv Detail & Related papers (2020-10-04T15:54:03Z) - Dynamic Model Pruning with Feedback [64.019079257231]
We propose a novel model compression method that generates a sparse trained model without additional overhead.
We evaluate our method on CIFAR-10 and ImageNet, and show that the obtained sparse models can reach the state-of-the-art performance of dense models.
arXiv Detail & Related papers (2020-06-12T15:07:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.