From Adversarial Arms Race to Model-centric Evaluation: Motivating a
Unified Automatic Robustness Evaluation Framework
- URL: http://arxiv.org/abs/2305.18503v1
- Date: Mon, 29 May 2023 14:55:20 GMT
- Title: From Adversarial Arms Race to Model-centric Evaluation: Motivating a
Unified Automatic Robustness Evaluation Framework
- Authors: Yangyi Chen, Hongcheng Gao, Ganqu Cui, Lifan Yuan, Dehan Kong, Hanlu
Wu, Ning Shi, Bo Yuan, Longtao Huang, Hui Xue, Zhiyuan Liu, Maosong Sun, Heng
Ji
- Abstract summary: Textual adversarial attacks can discover models' weaknesses by adding semantic-preserved but misleading perturbations to the inputs.
The existing practice of robustness evaluation may exhibit issues of incomprehensive evaluation, impractical evaluation protocol, and invalid adversarial samples.
We set up a unified automatic robustness evaluation framework, shifting towards model-centric evaluation to exploit the advantages of adversarial attacks.
- Score: 91.94389491920309
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Textual adversarial attacks can discover models' weaknesses by adding
semantic-preserved but misleading perturbations to the inputs. The long-lasting
adversarial attack-and-defense arms race in Natural Language Processing (NLP)
is algorithm-centric, providing valuable techniques for automatic robustness
evaluation. However, the existing practice of robustness evaluation may exhibit
issues of incomprehensive evaluation, impractical evaluation protocol, and
invalid adversarial samples. In this paper, we aim to set up a unified
automatic robustness evaluation framework, shifting towards model-centric
evaluation to further exploit the advantages of adversarial attacks. To address
the above challenges, we first determine robustness evaluation dimensions based
on model capabilities and specify the reasonable algorithm to generate
adversarial samples for each dimension. Then we establish the evaluation
protocol, including evaluation settings and metrics, under realistic demands.
Finally, we use the perturbation degree of adversarial samples to control the
sample validity. We implement a toolkit RobTest that realizes our automatic
robustness evaluation framework. In our experiments, we conduct a robustness
evaluation of RoBERTa models to demonstrate the effectiveness of our evaluation
framework, and further show the rationality of each component in the framework.
The code will be made public at \url{https://github.com/thunlp/RobTest}.
Related papers
- Towards Million-Scale Adversarial Robustness Evaluation With Stronger Individual Attacks [26.422616504640786]
We propose a novel individual attack method, Probability Margin Attack (PMA), which defines the adversarial margin in the probability space rather than the logits space.
We create a million-scale dataset, CC1M, and use it to conduct the first million-scale adversarial robustness evaluation of adversarially-trained ImageNet models.
arXiv Detail & Related papers (2024-11-20T10:41:23Z) - GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models [60.48306899271866]
We present a new framework, called GREAT Score, for global robustness evaluation of adversarial perturbation using generative models.
We show high correlation and significantly reduced cost of GREAT Score when compared to the attack-based model ranking on RobustBench.
GREAT Score can be used for remote auditing of privacy-sensitive black-box models.
arXiv Detail & Related papers (2023-04-19T14:58:27Z) - A Unified Evaluation of Textual Backdoor Learning: Frameworks and
Benchmarks [72.7373468905418]
We develop an open-source toolkit OpenBackdoor to foster the implementations and evaluations of textual backdoor learning.
We also propose CUBE, a simple yet strong clustering-based defense baseline.
arXiv Detail & Related papers (2022-06-17T02:29:23Z) - Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial
Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically.
Our method learns the in adversarial attacks parameterized by a recurrent neural network.
We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z) - A Comprehensive Evaluation Framework for Deep Model Robustness [44.20580847861682]
Deep neural networks (DNNs) have achieved remarkable performance across a wide area of applications.
They are vulnerable to adversarial examples, which motivates the adversarial defense.
This paper presents a model evaluation framework containing a comprehensive, rigorous, and coherent set of evaluation metrics.
arXiv Detail & Related papers (2021-01-24T01:04:25Z) - Robustness Gym: Unifying the NLP Evaluation Landscape [91.80175115162218]
Deep neural networks are often brittle when deployed in real-world systems.
Recent research has focused on testing the robustness of such models.
We propose a solution in the form of Robustness Gym, a simple and evaluation toolkit.
arXiv Detail & Related papers (2021-01-13T02:37:54Z) - RobustBench: a standardized adversarial robustness benchmark [84.50044645539305]
Key challenge in benchmarking robustness is that its evaluation is often error-prone leading to robustness overestimation.
We evaluate adversarial robustness with AutoAttack, an ensemble of white- and black-box attacks.
We analyze the impact of robustness on the performance on distribution shifts, calibration, out-of-distribution detection, fairness, privacy leakage, smoothness, and transferability.
arXiv Detail & Related papers (2020-10-19T17:06:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.