Related papers: Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT

Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT

URL: http://arxiv.org/abs/2003.04985v1
Date: Thu, 27 Feb 2020 22:07:11 GMT
Title: Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT
Authors: Lichao Sun, Kazuma Hashimoto, Wenpeng Yin, Akari Asai, Jia Li, Philip Yu, Caiming Xiong
Abstract summary: It is unclear, however, how the models will perform in realistic scenarios where textitnatural rather than malicious adversarial instances often exist. This work systematically explores the robustness of BERT, the state-of-the-art Transformer-style model in NLP, in dealing with noisy data.
Score: 95.88293021131035
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: There is an increasing amount of literature that claims the brittleness of deep neural networks in dealing with adversarial examples that are created maliciously. It is unclear, however, how the models will perform in realistic scenarios where \textit{natural rather than malicious} adversarial instances often exist. This work systematically explores the robustness of BERT, the state-of-the-art Transformer-style model in NLP, in dealing with noisy data, particularly mistakes in typing the keyboard, that occur inadvertently. Intensive experiments on sentiment analysis and question answering benchmarks indicate that: (i) Typos in various words of a sentence do not influence equally. The typos in informative words make severer damages; (ii) Mistype is the most damaging factor, compared with inserting, deleting, etc.; (iii) Humans and machines have different focuses on recognizing adversarial attacks.

Related papers

Semantic Stealth: Adversarial Text Attacks on NLP Using Several Methods [0.0]
A text adversarial attack involves the deliberate manipulation of input text to mislead the predictions of the model. BERT, BERT-on-BERT attack, and Fraud Bargain's Attack (FBA) are explored in this paper. PWWS emerges as the most potent adversary, consistently outperforming other methods across multiple evaluation scenarios.
arXiv Detail & Related papers (2024-04-08T02:55:01Z)
Adversarial Attacks and Dimensionality in Text Classifiers [3.4179091429029382]
Adversarial attacks on machine learning algorithms have been a key deterrent to the adoption of AI in many real-world use cases. We study adversarial examples in the field of natural language processing, specifically text classification tasks.
arXiv Detail & Related papers (2024-04-03T11:49:43Z)
SenTest: Evaluating Robustness of Sentence Encoders [0.4194295877935868]
This work focuses on evaluating the robustness of the sentence encoders. We employ several adversarial attacks to evaluate its robustness. The results of the experiments strongly undermine the robustness of sentence encoders.
arXiv Detail & Related papers (2023-11-29T15:21:35Z)
In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks. Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks. We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z)
Learning-based Hybrid Local Search for the Hard-label Textual Attack [53.92227690452377]
We consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker could only access the prediction label. Based on this observation, we propose a novel hard-label attack, called Learning-based Hybrid Local Search (LHLS) algorithm. Our LHLS significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.
arXiv Detail & Related papers (2022-01-20T14:16:07Z)
A Differentiable Language Model Adversarial Attack on Text Classifiers [10.658675415759697]
We propose a new black-box sentence-level attack for natural language processing. Our method fine-tunes a pre-trained language model to generate adversarial examples. We show that the proposed attack outperforms competitors on a diverse set of NLP problems for both computed metrics and human evaluation.
arXiv Detail & Related papers (2021-07-23T14:43:13Z)
Experiments with adversarial attacks on text genres [0.0]
Neural models based on pre-trained transformers, such as BERT or XLM-RoBERTa, demonstrate SOTA results in many NLP tasks. We show that embedding-based algorithms which can replace some of the most significant'' words with words similar to them, have the ability to influence model predictions in a significant proportion of cases.
arXiv Detail & Related papers (2021-07-05T19:37:59Z)
On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data. Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z)
BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images) We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples. Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z)
Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations [65.05561023880351]
Adversarial examples are malicious inputs crafted to induce misclassification. This paper studies a complementary failure mode, invariance-based adversarial examples. We show that defenses against sensitivity-based attacks actively harm a model's accuracy on invariance-based attacks.
arXiv Detail & Related papers (2020-02-11T18:50:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.