SemAttack: Natural Textual Attacks via Different Semantic Spaces
- URL: http://arxiv.org/abs/2205.01287v1
- Date: Tue, 3 May 2022 03:44:03 GMT
- Title: SemAttack: Natural Textual Attacks via Different Semantic Spaces
- Authors: Boxin Wang, Chejian Xu, Xiangyu Liu, Yu Cheng, Bo Li
- Abstract summary: We propose an efficient framework to generate natural adversarial text by constructing different semantic perturbation functions.
We show that SemAttack is able to generate adversarial texts for different languages with high attack success rates.
Our generated adversarial texts are natural and barely affect human performance.
- Score: 26.97034787803082
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent studies show that pre-trained language models (LMs) are vulnerable to
textual adversarial attacks. However, existing attack methods either suffer
from low attack success rates or fail to search efficiently in the
exponentially large perturbation space. We propose an efficient and effective
framework SemAttack to generate natural adversarial text by constructing
different semantic perturbation functions. In particular, SemAttack optimizes
the generated perturbations constrained on generic semantic spaces, including
typo space, knowledge space (e.g., WordNet), contextualized semantic space
(e.g., the embedding space of BERT clusterings), or the combination of these
spaces. Thus, the generated adversarial texts are more semantically close to
the original inputs. Extensive experiments reveal that state-of-the-art (SOTA)
large-scale LMs (e.g., DeBERTa-v2) and defense strategies (e.g., FreeLB) are
still vulnerable to SemAttack. We further demonstrate that SemAttack is general
and able to generate natural adversarial texts for different languages (e.g.,
English and Chinese) with high attack success rates. Human evaluations also
confirm that our generated adversarial texts are natural and barely affect
human performance. Our code is publicly available at
https://github.com/AI-secure/SemAttack.
Related papers
- Unveiling Vulnerability of Self-Attention [61.85150061213987]
Pre-trained language models (PLMs) are shown to be vulnerable to minor word changes.
This paper studies the basic structure of transformer-based PLMs, the self-attention (SA) mechanism.
We introduce textitS-Attend, a novel smoothing technique that effectively makes SA robust via structural perturbations.
arXiv Detail & Related papers (2024-02-26T10:31:45Z) - Defending Large Language Models against Jailbreak Attacks via Semantic
Smoothing [107.97160023681184]
Aligned large language models (LLMs) are vulnerable to jailbreaking attacks.
We propose SEMANTICSMOOTH, a smoothing-based defense that aggregates predictions of semantically transformed copies of a given input prompt.
arXiv Detail & Related papers (2024-02-25T20:36:03Z) - Textual Manifold-based Defense Against Natural Language Adversarial
Examples [10.140147080535222]
We find that adversarial texts tend to have their embeddings diverge from the manifold of natural ones.
We propose Textual Manifold-based Defense (TMD), a defense mechanism that projects text embeddings onto an approximated embedding manifold.
Our method consistently and significantly outperforms previous defenses without trading off clean accuracy.
arXiv Detail & Related papers (2022-11-05T11:19:47Z) - Learning-based Hybrid Local Search for the Hard-label Textual Attack [53.92227690452377]
We consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker could only access the prediction label.
Based on this observation, we propose a novel hard-label attack, called Learning-based Hybrid Local Search (LHLS) algorithm.
Our LHLS significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.
arXiv Detail & Related papers (2022-01-20T14:16:07Z) - Bridge the Gap Between CV and NLP! A Gradient-based Textual Adversarial
Attack Framework [17.17479625646699]
We propose a unified framework to craft textual adversarial samples.
In this paper, we instantiate our framework with an attack algorithm named Textual Projected Gradient Descent (T-PGD)
arXiv Detail & Related papers (2021-10-28T17:31:51Z) - Towards Robustness Against Natural Language Word Substitutions [87.56898475512703]
Robustness against word substitutions has a well-defined and widely acceptable form, using semantically similar words as substitutions.
Previous defense methods capture word substitutions in vector space by using either $l$-ball or hyper-rectangle.
arXiv Detail & Related papers (2021-07-28T17:55:08Z) - Towards Variable-Length Textual Adversarial Attacks [68.27995111870712]
It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data.
In this paper, we propose variable-length textual adversarial attacks(VL-Attack)
Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
arXiv Detail & Related papers (2021-04-16T14:37:27Z) - BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images)
We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples.
Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z) - Intriguing Properties of Adversarial ML Attacks in the Problem Space [Extended Version] [18.3238686304247]
We propose a general formalization for adversarial ML evasion attacks in the problem-space.
We propose a novel problem-space attack on Android malware that overcomes past limitations in terms of semantics and artifacts.
Our results demonstrate that "adversarial-malware as a service" is a realistic threat.
arXiv Detail & Related papers (2019-11-05T23:39:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.