Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models
- URL: http://arxiv.org/abs/2111.02840v1
- Date: Thu, 4 Nov 2021 12:59:55 GMT
- Title: Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models
- Authors: Boxin Wang, Chejian Xu, Shuohang Wang, Zhe Gan, Yu Cheng, Jianfeng
Gao, Ahmed Hassan Awadallah, Bo Li
- Abstract summary: Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.
All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
- Score: 86.02610674750345
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large-scale pre-trained language models have achieved tremendous success
across a wide range of natural language understanding (NLU) tasks, even
surpassing human performance. However, recent studies reveal that the
robustness of these models can be challenged by carefully crafted textual
adversarial examples. While several individual datasets have been proposed to
evaluate model robustness, a principled and comprehensive benchmark is still
missing. In this paper, we present Adversarial GLUE (AdvGLUE), a new multi-task
benchmark to quantitatively and thoroughly explore and evaluate the
vulnerabilities of modern large-scale language models under various types of
adversarial attacks. In particular, we systematically apply 14 textual
adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further
validated by humans for reliable annotations. Our findings are summarized as
follows. (i) Most existing adversarial attack algorithms are prone to
generating invalid or ambiguous adversarial examples, with around 90% of them
either changing the original semantic meanings or misleading human annotators
as well. Therefore, we perform a careful filtering process to curate a
high-quality benchmark. (ii) All the language models and robust training
methods we tested perform poorly on AdvGLUE, with scores lagging far behind the
benign accuracy. We hope our work will motivate the development of new
adversarial attacks that are more stealthy and semantic-preserving, as well as
new robust language models against sophisticated adversarial attacks. AdvGLUE
is available at https://adversarialglue.github.io.
Related papers
- A Generative Adversarial Attack for Multilingual Text Classifiers [10.993289209465129]
We propose an approach to fine-tune a multilingual paraphrase model with an adversarial objective.
The training objective incorporates a set of pre-trained models to ensure text quality and language consistency.
The experimental validation over two multilingual datasets and five languages has shown the effectiveness of the proposed approach.
arXiv Detail & Related papers (2024-01-16T10:14:27Z) - Robustifying Language Models with Test-Time Adaptation [17.96043752001886]
Large-scale language models achieved state-of-the-art performance over a number of language tasks.
They fail on adversarial language examples, which are sentences optimized to fool the language models but with similar semantic meanings for humans.
We show that we can reverse many language adversarial attacks by adapting the input sentence with predictions from masked words.
arXiv Detail & Related papers (2023-10-29T22:37:54Z) - Context-aware Adversarial Attack on Named Entity Recognition [15.049160192547909]
We study context-aware adversarial attack methods to examine the model's robustness.
Specifically, we propose perturbing the most informative words for recognizing entities to create adversarial examples.
Experiments and analyses show that our methods are more effective in deceiving the model into making wrong predictions than strong baselines.
arXiv Detail & Related papers (2023-09-16T14:04:23Z) - On Robustness of Prompt-based Semantic Parsing with Large Pre-trained
Language Model: An Empirical Study on Codex [48.588772371355816]
This paper presents the first empirical study on the adversarial robustness of a large prompt-based language model of code, codex.
Our results demonstrate that the state-of-the-art (SOTA) code-language models are vulnerable to carefully crafted adversarial examples.
arXiv Detail & Related papers (2023-01-30T13:21:00Z) - Sentence Representation Learning with Generative Objective rather than
Contrastive Objective [86.01683892956144]
We propose a novel generative self-supervised learning objective based on phrase reconstruction.
Our generative learning achieves powerful enough performance improvement and outperforms the current state-of-the-art contrastive methods.
arXiv Detail & Related papers (2022-10-16T07:47:46Z) - A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities.
We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention.
Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z) - Characterizing the adversarial vulnerability of speech self-supervised
learning [95.03389072594243]
We make the first attempt to investigate the adversarial vulnerability of such paradigm under the attacks from both zero-knowledge adversaries and limited-knowledge adversaries.
The experimental results illustrate that the paradigm proposed by SUPERB is seriously vulnerable to limited-knowledge adversaries.
arXiv Detail & Related papers (2021-11-08T08:44:04Z) - A Differentiable Language Model Adversarial Attack on Text Classifiers [10.658675415759697]
We propose a new black-box sentence-level attack for natural language processing.
Our method fine-tunes a pre-trained language model to generate adversarial examples.
We show that the proposed attack outperforms competitors on a diverse set of NLP problems for both computed metrics and human evaluation.
arXiv Detail & Related papers (2021-07-23T14:43:13Z) - InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks.
We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.