Towards Evaluating the Robustness of Chinese BERT Classifiers
- URL: http://arxiv.org/abs/2004.03742v1
- Date: Tue, 7 Apr 2020 23:02:37 GMT
- Title: Towards Evaluating the Robustness of Chinese BERT Classifiers
- Authors: Boxin Wang, Boyuan Pan, Xin Li, Bo Li
- Abstract summary: We propose a novel Chinese char-level attack method against BERT-based classifiers.
Experiments show that the classification accuracy on a Chinese news dataset drops from 91.8% to 0%.
- Score: 19.06256105080416
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in large-scale language representation models such as BERT
have improved the state-of-the-art performances in many NLP tasks. Meanwhile,
character-level Chinese NLP models, including BERT for Chinese, have also
demonstrated that they can outperform the existing models. In this paper, we
show that, however, such BERT-based models are vulnerable under character-level
adversarial attacks. We propose a novel Chinese char-level attack method
against BERT-based classifiers. Essentially, we generate "small" perturbation
on the character level in the embedding space and guide the character
substitution procedure. Extensive experiments show that the classification
accuracy on a Chinese news dataset drops from 91.8% to 0% by manipulating less
than 2 characters on average based on the proposed attack. Human evaluations
also confirm that our generated Chinese adversarial examples barely affect
human performance on these NLP tasks.
Related papers
- Semantic Stealth: Adversarial Text Attacks on NLP Using Several Methods [0.0]
A text adversarial attack involves the deliberate manipulation of input text to mislead the predictions of the model.
BERT, BERT-on-BERT attack, and Fraud Bargain's Attack (FBA) are explored in this paper.
PWWS emerges as the most potent adversary, consistently outperforming other methods across multiple evaluation scenarios.
arXiv Detail & Related papers (2024-04-08T02:55:01Z) - Arabic Synonym BERT-based Adversarial Examples for Text Classification [0.0]
This paper introduces the first word-level study of adversarial attacks in Arabic.
We assess the robustness of the state-of-the-art text classification models to adversarial attacks in Arabic.
We study the transferability of these newly produced Arabic adversarial examples to various models and investigate the effectiveness of defense mechanisms.
arXiv Detail & Related papers (2024-02-05T19:39:07Z) - Expanding Scope: Adapting English Adversarial Attacks to Chinese [11.032727439758661]
This paper investigates how to adapt SOTA adversarial attack algorithms in English to the Chinese language.
Our experiments show that attack methods previously applied to English NLP can generate high-quality adversarial examples in Chinese.
In addition, we demonstrate that the generated adversarial examples can achieve high fluency and semantic consistency.
arXiv Detail & Related papers (2023-06-08T02:07:49Z) - Model-tuning Via Prompts Makes NLP Models Adversarially Robust [97.02353907677703]
We show surprising gains in adversarial robustness enjoyed by Model-tuning Via Prompts (MVP)
MVP improves performance against adversarial substitutions by an average of 8% over standard methods.
We also conduct ablations to investigate the mechanism underlying these gains.
arXiv Detail & Related papers (2023-03-13T17:41:57Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - CSCD-NS: a Chinese Spelling Check Dataset for Native Speakers [62.61866477815883]
We present CSCD-NS, the first Chinese spelling check dataset designed for native speakers.
CSCD-NS is ten times larger in scale and exhibits a distinct error distribution.
We propose a novel method that simulates the input process through an input method.
arXiv Detail & Related papers (2022-11-16T09:25:42Z) - Phrase-level Adversarial Example Generation for Neural Machine
Translation [75.01476479100569]
We propose a phrase-level adversarial example generation (PAEG) method to enhance the robustness of the model.
We verify our method on three benchmarks, including LDC Chinese-English, IWSLT14 German-English, and WMT14 English-German tasks.
arXiv Detail & Related papers (2022-01-06T11:00:49Z) - Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.
All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z) - Enhancing Model Robustness By Incorporating Adversarial Knowledge Into
Semantic Representation [42.23608639683468]
AdvGraph is a novel defense which enhances the robustness of Chinese-based NLP models.
It incorporates adversarial knowledge into the semantic representation of the input.
arXiv Detail & Related papers (2021-02-23T09:47:45Z) - BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images)
We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples.
Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.