Expanding Scope: Adapting English Adversarial Attacks to Chinese
- URL: http://arxiv.org/abs/2306.04874v1
- Date: Thu, 8 Jun 2023 02:07:49 GMT
- Title: Expanding Scope: Adapting English Adversarial Attacks to Chinese
- Authors: Hanyu Liu, Chengyuan Cai, Yanjun Qi
- Abstract summary: This paper investigates how to adapt SOTA adversarial attack algorithms in English to the Chinese language.
Our experiments show that attack methods previously applied to English NLP can generate high-quality adversarial examples in Chinese.
In addition, we demonstrate that the generated adversarial examples can achieve high fluency and semantic consistency.
- Score: 11.032727439758661
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent studies have revealed that NLP predictive models are vulnerable to
adversarial attacks. Most existing studies focused on designing attacks to
evaluate the robustness of NLP models in the English language alone. Literature
has seen an increasing need for NLP solutions for other languages. We,
therefore, ask one natural question: whether state-of-the-art (SOTA) attack
methods generalize to other languages. This paper investigates how to adapt
SOTA adversarial attack algorithms in English to the Chinese language. Our
experiments show that attack methods previously applied to English NLP can
generate high-quality adversarial examples in Chinese when combined with proper
text segmentation and linguistic constraints. In addition, we demonstrate that
the generated adversarial examples can achieve high fluency and semantic
consistency by focusing on the Chinese language's morphology and phonology,
which in turn can be used to improve the adversarial robustness of Chinese NLP
models.
Related papers
- Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance [6.907734681124986]
This paper strategically identifies the need for linguistic equity by examining several knowledge editing techniques in multilingual contexts.
We evaluate the performance of models such as Mistral, TowerInstruct, OpenHathi, Tamil-Llama, and Kan-Llama across languages including English, German, French, Italian, Spanish, Hindi, Tamil, and Kannada.
arXiv Detail & Related papers (2024-06-17T01:54:27Z) - Enhance Robustness of Language Models Against Variation Attack through Graph Integration [28.291064456532517]
We propose a novel method, CHinese vAriatioN Graph Enhancement, to increase the robustness of language models against character variation attacks.
CHANGE essentially enhances PLMs' interpretation of adversarially manipulated text.
Experiments conducted in a multitude of NLP tasks show that CHANGE outperforms current language models in combating against adversarial attacks.
arXiv Detail & Related papers (2024-04-18T09:04:39Z) - SA-Attack: Improving Adversarial Transferability of Vision-Language
Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios.
We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z) - A Classification-Guided Approach for Adversarial Attacks against Neural
Machine Translation [66.58025084857556]
We introduce ACT, a novel adversarial attack framework against NMT systems guided by a classifier.
In our attack, the adversary aims to craft meaning-preserving adversarial examples whose translations belong to a different class than the original translations.
To evaluate the robustness of NMT models to our attack, we propose enhancements to existing black-box word-replacement-based attacks.
arXiv Detail & Related papers (2023-08-29T12:12:53Z) - Lost In Translation: Generating Adversarial Examples Robust to
Round-Trip Translation [66.33340583035374]
We present a comprehensive study on the robustness of current text adversarial attacks to round-trip translation.
We demonstrate that 6 state-of-the-art text-based adversarial attacks do not maintain their efficacy after round-trip translation.
We introduce an intervention-based solution to this problem, by integrating Machine Translation into the process of adversarial example generation.
arXiv Detail & Related papers (2023-07-24T04:29:43Z) - COLD: A Benchmark for Chinese Offensive Language Detection [54.60909500459201]
We use COLDataset, a Chinese offensive language dataset with 37k annotated sentences.
We also propose textscCOLDetector to study output offensiveness of popular Chinese language models.
Our resources and analyses are intended to help detoxify the Chinese online communities and evaluate the safety performance of generative language models.
arXiv Detail & Related papers (2022-01-16T11:47:23Z) - Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.
All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z) - Language Models are Few-shot Multilingual Learners [66.11011385895195]
We evaluate the multilingual skills of the GPT and T5 models in conducting multi-class classification on non-English languages.
We show that, given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones.
arXiv Detail & Related papers (2021-09-16T03:08:22Z) - Towards Evaluating the Robustness of Chinese BERT Classifiers [19.06256105080416]
We propose a novel Chinese char-level attack method against BERT-based classifiers.
Experiments show that the classification accuracy on a Chinese news dataset drops from 91.8% to 0%.
arXiv Detail & Related papers (2020-04-07T23:02:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.