Context-aware Adversarial Attack on Named Entity Recognition
- URL: http://arxiv.org/abs/2309.08999v2
- Date: Sat, 3 Feb 2024 00:11:11 GMT
- Title: Context-aware Adversarial Attack on Named Entity Recognition
- Authors: Shuguang Chen, Leonardo Neves, and Thamar Solorio
- Abstract summary: We study context-aware adversarial attack methods to examine the model's robustness.
Specifically, we propose perturbing the most informative words for recognizing entities to create adversarial examples.
Experiments and analyses show that our methods are more effective in deceiving the model into making wrong predictions than strong baselines.
- Score: 15.049160192547909
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, large pre-trained language models (PLMs) have achieved
remarkable performance on many natural language processing benchmarks. Despite
their success, prior studies have shown that PLMs are vulnerable to attacks
from adversarial examples. In this work, we focus on the named entity
recognition task and study context-aware adversarial attack methods to examine
the model's robustness. Specifically, we propose perturbing the most
informative words for recognizing entities to create adversarial examples and
investigate different candidate replacement methods to generate natural and
plausible adversarial examples. Experiments and analyses show that our methods
are more effective in deceiving the model into making wrong predictions than
strong baselines.
Related papers
- MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models.
Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs.
Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z) - Semantic Stealth: Adversarial Text Attacks on NLP Using Several Methods [0.0]
A text adversarial attack involves the deliberate manipulation of input text to mislead the predictions of the model.
BERT, BERT-on-BERT attack, and Fraud Bargain's Attack (FBA) are explored in this paper.
PWWS emerges as the most potent adversary, consistently outperforming other methods across multiple evaluation scenarios.
arXiv Detail & Related papers (2024-04-08T02:55:01Z) - SemRoDe: Macro Adversarial Training to Learn Representations That are Robust to Word-Level Attacks [29.942001958562567]
We propose a novel approach called Semantic Robust Defence (SemRoDeversa) to enhance the robustness of language models.
Our method learns a robust representation that bridges these two domains.
The results demonstrate promising state-of-the-art robustness.
arXiv Detail & Related papers (2024-03-27T10:24:25Z) - SA-Attack: Improving Adversarial Transferability of Vision-Language
Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios.
We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z) - Generating Valid and Natural Adversarial Examples with Large Language
Models [18.944937459278197]
adversarial attack models are not valid nor natural, leading to the loss of semantic maintenance, grammaticality, and human imperceptibility.
We propose LLM-Attack, which aims at generating both valid and natural adversarial examples with large language models.
Experimental results on the Movie Review (MR), IMDB, and Review Polarity datasets against the baseline adversarial attack models illustrate the effectiveness of LLM-Attack.
arXiv Detail & Related papers (2023-11-20T15:57:04Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - Effective Targeted Attacks for Adversarial Self-Supervised Learning [58.14233572578723]
unsupervised adversarial training (AT) has been highlighted as a means of achieving robustness in models without any label information.
We propose a novel positive mining for targeted adversarial attack to generate effective adversaries for adversarial SSL frameworks.
Our method demonstrates significant enhancements in robustness when applied to non-contrastive SSL frameworks, and less but consistent robustness improvements with contrastive SSL frameworks.
arXiv Detail & Related papers (2022-10-19T11:43:39Z) - Characterizing the adversarial vulnerability of speech self-supervised
learning [95.03389072594243]
We make the first attempt to investigate the adversarial vulnerability of such paradigm under the attacks from both zero-knowledge adversaries and limited-knowledge adversaries.
The experimental results illustrate that the paradigm proposed by SUPERB is seriously vulnerable to limited-knowledge adversaries.
arXiv Detail & Related papers (2021-11-08T08:44:04Z) - Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.
All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z) - A Context Aware Approach for Generating Natural Language Attacks [3.52359746858894]
We propose an attack strategy that crafts semantically similar adversarial examples on text classification and entailment tasks.
Our proposed attack finds candidate words by considering the information of both the original word and its surrounding context.
arXiv Detail & Related papers (2020-12-24T17:24:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.