Why Should Adversarial Perturbations be Imperceptible? Rethink the
Research Paradigm in Adversarial NLP
- URL: http://arxiv.org/abs/2210.10683v1
- Date: Wed, 19 Oct 2022 15:53:36 GMT
- Title: Why Should Adversarial Perturbations be Imperceptible? Rethink the
Research Paradigm in Adversarial NLP
- Authors: Yangyi Chen, Hongcheng Gao, Ganqu Cui, Fanchao Qi, Longtao Huang,
Zhiyuan Liu and Maosong Sun
- Abstract summary: We rethink the research paradigm of textual adversarial samples in security scenarios.
We first collect, process, and release a security datasets collection Advbench.
Next, we propose a simple method based on rules that can easily fulfill the actual adversarial goals to simulate real-world attack methods.
- Score: 83.66405397421907
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Textual adversarial samples play important roles in multiple subfields of NLP
research, including security, evaluation, explainability, and data
augmentation. However, most work mixes all these roles, obscuring the problem
definitions and research goals of the security role that aims to reveal the
practical concerns of NLP models. In this paper, we rethink the research
paradigm of textual adversarial samples in security scenarios. We discuss the
deficiencies in previous work and propose our suggestions that the research on
the Security-oriented adversarial NLP (SoadNLP) should: (1) evaluate their
methods on security tasks to demonstrate the real-world concerns; (2) consider
real-world attackers' goals, instead of developing impractical methods. To this
end, we first collect, process, and release a security datasets collection
Advbench. Then, we reformalize the task and adjust the emphasis on different
goals in SoadNLP. Next, we propose a simple method based on heuristic rules
that can easily fulfill the actual adversarial goals to simulate real-world
attack methods. We conduct experiments on both the attack and the defense sides
on Advbench. Experimental results show that our method has higher practical
value, indicating that the research paradigm in SoadNLP may start from our new
benchmark. All the code and data of Advbench can be obtained at
\url{https://github.com/thunlp/Advbench}.
Related papers
- The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense [56.32083100401117]
We investigate why Vision Large Language Models (VLLMs) are prone to jailbreak attacks.
We then make a key observation: existing defense mechanisms suffer from an textbfover-prudence problem.
We find that the two representative evaluation methods for jailbreak often exhibit chance agreement.
arXiv Detail & Related papers (2024-11-13T07:57:19Z) - Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey [7.945893812374361]
This paper aims to clear some common concerns for the attack setting, and formally establish the research problem.
Specifically, we first present the threat model of the problem, and introduce the harmful fine-tuning attack and its variants.
Finally, we outline future research directions that might contribute to the development of the field.
arXiv Detail & Related papers (2024-09-26T17:55:22Z) - A Survey of Defenses against AI-generated Visual Media: Detection, Disruption, and Authentication [15.879482578829489]
Deep generative models have demonstrated impressive performance in various computer vision applications.
These models may be used for malicious purposes, such as misinformation, deception, and copyright violation.
This paper provides a systematic and timely review of research efforts on defenses against AI-generated visual media.
arXiv Detail & Related papers (2024-07-15T09:46:02Z) - ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming [64.86326523181553]
ALERT is a large-scale benchmark to assess safety based on a novel fine-grained risk taxonomy.
It aims to identify vulnerabilities, inform improvements, and enhance the overall safety of the language models.
arXiv Detail & Related papers (2024-04-06T15:01:47Z) - SoK: Privacy-Preserving Data Synthesis [72.92263073534899]
This paper focuses on privacy-preserving data synthesis (PPDS) by providing a comprehensive overview, analysis, and discussion of the field.
We put forth a master recipe that unifies two prominent strands of research in PPDS: statistical methods and deep learning (DL)-based methods.
arXiv Detail & Related papers (2023-07-05T08:29:31Z) - Scenario-Agnostic Zero-Trust Defense with Explainable Threshold Policy:
A Meta-Learning Approach [20.11993437283895]
We propose a scenario-agnostic zero-trust defense based on Partially Observable Markov Decision Processes (POMDP) and first-order Meta-Learning.
We use case studies and real-world attacks to corroborate the results.
arXiv Detail & Related papers (2023-03-06T18:35:34Z) - Measuring Equality in Machine Learning Security Defenses: A Case Study
in Speech Recognition [56.69875958980474]
This work considers approaches to defending learned systems and how security defenses result in performance inequities across different sub-populations.
We find that many methods that have been proposed can cause direct harm, like false rejection and unequal benefits from robustness training.
We present a comparison of equality between two rejection-based defenses: randomized smoothing and neural rejection, finding randomized smoothing more equitable due to the sampling mechanism for minority groups.
arXiv Detail & Related papers (2023-02-17T16:19:26Z) - A Unified Evaluation of Textual Backdoor Learning: Frameworks and
Benchmarks [72.7373468905418]
We develop an open-source toolkit OpenBackdoor to foster the implementations and evaluations of textual backdoor learning.
We also propose CUBE, a simple yet strong clustering-based defense baseline.
arXiv Detail & Related papers (2022-06-17T02:29:23Z) - Opportunities and Challenges in Deep Learning Adversarial Robustness: A
Survey [1.8782750537161614]
This paper studies strategies to implement adversary robustly trained algorithms towards guaranteeing safety in machine learning algorithms.
We provide a taxonomy to classify adversarial attacks and defenses, formulate the Robust Optimization problem in a min-max setting, and divide it into 3 subcategories, namely: Adversarial (re)Training, Regularization Approach, and Certified Defenses.
arXiv Detail & Related papers (2020-07-01T21:00:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.