Related papers: Experiments with adversarial attacks on text genres

Experiments with adversarial attacks on text genres

URL: http://arxiv.org/abs/2107.02246v1
Date: Mon, 5 Jul 2021 19:37:59 GMT
Title: Experiments with adversarial attacks on text genres
Authors: Mikhail Lepekhin, Serge Sharoff
Abstract summary: Neural models based on pre-trained transformers, such as BERT or XLM-RoBERTa, demonstrate SOTA results in many NLP tasks. We show that embedding-based algorithms which can replace some of the most significant'' words with words similar to them, have the ability to influence model predictions in a significant proportion of cases.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Neural models based on pre-trained transformers, such as BERT or XLM-RoBERTa, demonstrate SOTA results in many NLP tasks, including non-topical classification, such as genre identification. However, often these approaches exhibit low reliability to minor alterations of the test texts. A related probelm concerns topical biases in the training corpus, for example, the prevalence of words on a specific topic in a specific genre can trick the genre classifier to recognise any text on this topic in this genre. In order to mitigate the reliability problem, this paper investigates techniques for attacking genre classifiers to understand the limitations of the transformer models and to improve their performance. While simple text attacks, such as those based on word replacement using keywords extracted by tf-idf, are not capable of deceiving powerful models like XLM-RoBERTa, we show that embedding-based algorithms which can replace some of the most ``significant'' words with words similar to them, for example, TextFooler, have the ability to influence model predictions in a significant proportion of cases.

Related papers

Attacking Misinformation Detection Using Adversarial Examples Generated by Language Models [0.0]
We investigate the challenge of generating adversarial examples to test the robustness of text classification algorithms. We focus on simulation of content moderation by setting realistic limits on the number of queries an attacker is allowed to attempt.
arXiv Detail & Related papers (2024-10-28T11:46:30Z)
Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models. We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks. Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z)
In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks. Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks. We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z)
Estimating Confidence of Predictions of Individual Classifiers and Their Ensembles for the Genre Classification Task [0.0]
Genre identification is a subclass of non-topical text classification. Nerve models based on pre-trained transformers, such as BERT or XLM-RoBERTa, demonstrate SOTA results in many NLP tasks.
arXiv Detail & Related papers (2022-06-15T09:59:05Z)
Learning-based Hybrid Local Search for the Hard-label Textual Attack [53.92227690452377]
We consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker could only access the prediction label. Based on this observation, we propose a novel hard-label attack, called Learning-based Hybrid Local Search (LHLS) algorithm. Our LHLS significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.
arXiv Detail & Related papers (2022-01-20T14:16:07Z)
ShufText: A Simple Black Box Approach to Evaluate the Fragility of Text Classification Models [0.0]
Deep learning approaches based on CNN, LSTM, and Transformers have been the de facto approach for text classification. We show that these systems are over-reliant on the important words present in the text that are useful for classification.
arXiv Detail & Related papers (2021-01-30T15:18:35Z)
MASKER: Masked Keyword Regularization for Reliable Text Classification [73.90326322794803]
We propose a fine-tuning method, coined masked keyword regularization (MASKER), that facilitates context-based prediction. MASKER regularizes the model to reconstruct the keywords from the rest of the words and make low-confidence predictions without enough context. We demonstrate that MASKER improves OOD detection and cross-domain generalization without degrading classification accuracy.
arXiv Detail & Related papers (2020-12-17T04:54:16Z)
On the Transferability of Adversarial Attacksagainst Neural Text Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models. We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models. We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z)
CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial Text Generation [20.27052525082402]
We present a Controlled Adversarial Text Generation (CAT-Gen) model that generates adversarial texts through controllable attributes. Experiments on real-world NLP datasets demonstrate that our method can generate more diverse and fluent adversarial texts.
arXiv Detail & Related papers (2020-10-05T21:07:45Z)
Frequency-Guided Word Substitutions for Detecting Textual Adversarial Examples [16.460051008283887]
We show that adversarial attacks against CNN, LSTM and Transformer-based classification models perform word substitutions. We propose frequency-guided word substitutions (FGWS) for the detection of adversarial examples. FGWS achieves strong performance by accurately detecting adversarial examples on the SST-2 and IMDb sentiment datasets.
arXiv Detail & Related papers (2020-04-13T12:11:36Z)
Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT [95.88293021131035]
It is unclear, however, how the models will perform in realistic scenarios where textitnatural rather than malicious adversarial instances often exist. This work systematically explores the robustness of BERT, the state-of-the-art Transformer-style model in NLP, in dealing with noisy data.
arXiv Detail & Related papers (2020-02-27T22:07:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.