Pay Attention to the Robustness of Chinese Minority Language Models! Syllable-level Textual Adversarial Attack on Tibetan Script
- URL: http://arxiv.org/abs/2412.02323v2
- Date: Wed, 04 Dec 2024 09:08:45 GMT
- Title: Pay Attention to the Robustness of Chinese Minority Language Models! Syllable-level Textual Adversarial Attack on Tibetan Script
- Authors: Xi Cao, Dolma Dawa, Nuo Qun, Trashi Nyima,
- Abstract summary: Textual adversarial attacks are a new challenge for the information processing of Chinese minority languages.
We propose a Tibetan syllable-level black-box textual adversarial attack called TSAttacker.
Experiment results show that TSAttacker is effective and generates high-quality adversarial samples.
- Score: 0.0
- License:
- Abstract: The textual adversarial attack refers to an attack method in which the attacker adds imperceptible perturbations to the original texts by elaborate design so that the NLP (natural language processing) model produces false judgments. This method is also used to evaluate the robustness of NLP models. Currently, most of the research in this field focuses on English, and there is also a certain amount of research on Chinese. However, to the best of our knowledge, there is little research targeting Chinese minority languages. Textual adversarial attacks are a new challenge for the information processing of Chinese minority languages. In response to this situation, we propose a Tibetan syllable-level black-box textual adversarial attack called TSAttacker based on syllable cosine distance and scoring mechanism. And then, we conduct TSAttacker on six models generated by fine-tuning two PLMs (pre-trained language models) for three downstream tasks. The experiment results show that TSAttacker is effective and generates high-quality adversarial samples. In addition, the robustness of the involved models still has much room for improvement.
Related papers
- Human-in-the-Loop Generation of Adversarial Texts: A Case Study on Tibetan Script [7.5950217558686255]
Adversarial texts play crucial roles in multiple subfields of NLP.
We introduce HITL-GAT, a system based on a general approach to human-in-the-loop generation of adversarial texts.
arXiv Detail & Related papers (2024-12-17T02:29:54Z) - Are Language Models Agnostic to Linguistically Grounded Perturbations? A Case Study of Indic Languages [47.45957604683302]
We study whether pre-trained language models are agnostic to linguistically grounded attacks or not.
Our findings reveal that although PLMs are susceptible to linguistic perturbations, when compared to non-linguistic attacks, PLMs exhibit a slightly lower susceptibility to linguistic attacks.
arXiv Detail & Related papers (2024-12-14T12:10:38Z) - TSCheater: Generating High-Quality Tibetan Adversarial Texts via Visual Similarity [3.1854179230109363]
We propose a novel Tibetan adversarial text generation method called TSCheater.
It considers the characteristic of Tibetan encoding and the feature that visually similar syllables have similar semantics.
Experimentally, TSCheater outperforms existing methods in attack effectiveness, perturbation, semantic similarity, visual similarity, and human acceptance.
arXiv Detail & Related papers (2024-12-03T10:57:19Z) - Multi-Granularity Tibetan Textual Adversarial Attack Method Based on Masked Language Model [0.0]
We propose a multi-granularity Tibetan textual adversarial attack method based on masked language models called TSTricker.
Results show that TSTricker reduces the accuracy of the classification models by more than 28.70% and makes the classification models change the predictions of more than 90.60% of the samples.
arXiv Detail & Related papers (2024-12-03T10:03:52Z) - Expanding Scope: Adapting English Adversarial Attacks to Chinese [11.032727439758661]
This paper investigates how to adapt SOTA adversarial attack algorithms in English to the Chinese language.
Our experiments show that attack methods previously applied to English NLP can generate high-quality adversarial examples in Chinese.
In addition, we demonstrate that the generated adversarial examples can achieve high fluency and semantic consistency.
arXiv Detail & Related papers (2023-06-08T02:07:49Z) - Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models.
We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - COLD: A Benchmark for Chinese Offensive Language Detection [54.60909500459201]
We use COLDataset, a Chinese offensive language dataset with 37k annotated sentences.
We also propose textscCOLDetector to study output offensiveness of popular Chinese language models.
Our resources and analyses are intended to help detoxify the Chinese online communities and evaluate the safety performance of generative language models.
arXiv Detail & Related papers (2022-01-16T11:47:23Z) - How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial
Robustness? [121.57551065856164]
We propose Robust Informative Fine-Tuning (RIFT) as a novel adversarial fine-tuning method from an information-theoretical perspective.
RIFT encourages an objective model to retain the features learned from the pre-trained model throughout the entire fine-tuning process.
Experimental results show that RIFT consistently outperforms the state-of-the-arts on two popular NLP tasks.
arXiv Detail & Related papers (2021-12-22T05:04:41Z) - Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.
All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z) - Towards Variable-Length Textual Adversarial Attacks [68.27995111870712]
It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data.
In this paper, we propose variable-length textual adversarial attacks(VL-Attack)
Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
arXiv Detail & Related papers (2021-04-16T14:37:27Z) - Towards Evaluating the Robustness of Chinese BERT Classifiers [19.06256105080416]
We propose a novel Chinese char-level attack method against BERT-based classifiers.
Experiments show that the classification accuracy on a Chinese news dataset drops from 91.8% to 0%.
arXiv Detail & Related papers (2020-04-07T23:02:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.