Model Robustness with Text Classification: Semantic-preserving
adversarial attacks
- URL: http://arxiv.org/abs/2008.05536v2
- Date: Fri, 14 Aug 2020 01:05:09 GMT
- Title: Model Robustness with Text Classification: Semantic-preserving
adversarial attacks
- Authors: Rahul Singh, Tarun Joshi, Vijayan N. Nair, and Agus Sudjianto
- Abstract summary: We propose algorithms to create adversarial attacks to assess model robustness in text classification problems.
The attacks cause significant number of flips in white-box setting and same rule based can be used in black-box setting.
- Score: 12.31604391452686
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose algorithms to create adversarial attacks to assess model
robustness in text classification problems. They can be used to create white
box attacks and black box attacks while at the same time preserving the
semantics and syntax of the original text. The attacks cause significant number
of flips in white-box setting and same rule based can be used in black-box
setting. In a black-box setting, the attacks created are able to reverse
decisions of transformer based architectures.
Related papers
- Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models [29.1607388062023]
This paper focuses on a challenging scenario: decision-based black-box targeted attacks where the attackers only have access to the final output text and aim to perform targeted attacks.
A three-stage process textitAsk, Attend, Attack, called textitAAA, is proposed to coordinate with the solver.
Experimental results on transformer-based and CNN+RNN-based image-to-text models confirmed the effectiveness of our proposed textitAAA
arXiv Detail & Related papers (2024-08-16T19:35:06Z) - A Random Ensemble of Encrypted Vision Transformers for Adversarially
Robust Defense [6.476298483207895]
Deep neural networks (DNNs) are well known to be vulnerable to adversarial examples (AEs)
We propose a novel method using the vision transformer (ViT) that is a random ensemble of encrypted models for enhancing robustness against both white-box and black-box attacks.
In experiments, the method was demonstrated to be robust against not only white-box attacks but also black-box ones in an image classification task.
arXiv Detail & Related papers (2024-02-11T12:35:28Z) - Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence [34.35162562625252]
Black-box adversarial attacks have demonstrated strong potential to compromise machine learning models.
We study a new paradigm of black-box attacks with provable guarantees.
This new black-box attack unveils significant vulnerabilities of machine learning models.
arXiv Detail & Related papers (2023-04-10T01:12:09Z) - Adversarial Text Normalization [2.9434930072968584]
Adversarial Text Normalizer restores baseline performance on attacked content with low computational overhead.
We find that text normalization provides a task-agnostic defense against character-level attacks.
arXiv Detail & Related papers (2022-06-08T19:44:03Z) - Zero-Query Transfer Attacks on Context-Aware Object Detectors [95.18656036716972]
Adversarial attacks perturb images such that a deep neural network produces incorrect classification results.
A promising approach to defend against adversarial attacks on natural multi-object scenes is to impose a context-consistency check.
We present the first approach for generating context-consistent adversarial attacks that can evade the context-consistency check.
arXiv Detail & Related papers (2022-03-29T04:33:06Z) - Parallel Rectangle Flip Attack: A Query-based Black-box Attack against
Object Detection [89.08832589750003]
We propose a Parallel Rectangle Flip Attack (PRFA) via random search to avoid sub-optimal detection near the attacked region.
Our method can effectively and efficiently attack various popular object detectors, including anchor-based and anchor-free, and generate transferable adversarial examples.
arXiv Detail & Related papers (2022-01-22T06:00:17Z) - Meta Gradient Adversarial Attack [64.5070788261061]
This paper proposes a novel architecture called Metaversa Gradient Adrial Attack (MGAA), which is plug-and-play and can be integrated with any existing gradient-based attack method.
Specifically, we randomly sample multiple models from a model zoo to compose different tasks and iteratively simulate a white-box attack and a black-box attack in each task.
By narrowing the gap between the gradient directions in white-box and black-box attacks, the transferability of adversarial examples on the black-box setting can be improved.
arXiv Detail & Related papers (2021-08-09T17:44:19Z) - Towards Variable-Length Textual Adversarial Attacks [68.27995111870712]
It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data.
In this paper, we propose variable-length textual adversarial attacks(VL-Attack)
Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
arXiv Detail & Related papers (2021-04-16T14:37:27Z) - Can Targeted Adversarial Examples Transfer When the Source and Target
Models Have No Label Space Overlap? [36.96777303738315]
We design blackbox transfer-based targeted adversarial attacks for an environment where the attacker's source model and the target blackbox model may have disjoint label spaces and training datasets.
Our methodology begins with the construction of a class correspondence matrix between the whitebox and blackbox label sets.
We show that our transfer attacks serve as powerful adversarial priors when integrated with query-based methods.
arXiv Detail & Related papers (2021-03-17T21:21:44Z) - Local Black-box Adversarial Attacks: A Query Efficient Approach [64.98246858117476]
Adrial attacks have threatened the application of deep neural networks in security-sensitive scenarios.
We propose a novel framework to perturb the discriminative areas of clean examples only within limited queries in black-box attacks.
We conduct extensive experiments to show that our framework can significantly improve the query efficiency during black-box perturbing with a high attack success rate.
arXiv Detail & Related papers (2021-01-04T15:32:16Z) - Spanning Attack: Reinforce Black-box Attacks with Unlabeled Data [96.92837098305898]
Black-box attacks aim to craft adversarial perturbations by querying input-output pairs of machine learning models.
Black-box attacks often suffer from the issue of query inefficiency due to the high dimensionality of the input space.
We propose a novel technique called the spanning attack, which constrains adversarial perturbations in a low-dimensional subspace via spanning an auxiliary unlabeled dataset.
arXiv Detail & Related papers (2020-05-11T05:57:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.