Grey-box Adversarial Attack And Defence For Sentiment Classification
- URL: http://arxiv.org/abs/2103.11576v1
- Date: Mon, 22 Mar 2021 04:05:17 GMT
- Title: Grey-box Adversarial Attack And Defence For Sentiment Classification
- Authors: Ying Xu, Xu Zhong, Antonio Jimeno Yepes, Jey Han Lau
- Abstract summary: We introduce a grey-box adversarial attack and defence framework for sentiment classification.
We address the issues of differentiability, label preservation and input reconstruction for adversarial attack and defence in one unified framework.
- Score: 19.466940655682727
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a grey-box adversarial attack and defence framework for
sentiment classification. We address the issues of differentiability, label
preservation and input reconstruction for adversarial attack and defence in one
unified framework. Our results show that once trained, the attacking model is
capable of generating high-quality adversarial examples substantially faster
(one order of magnitude less in time) than state-of-the-art attacking methods.
These examples also preserve the original sentiment according to human
evaluation. Additionally, our framework produces an improved classifier that is
robust in defending against multiple adversarial attacking methods. Code is
available at: https://github.com/ibm-aur-nlp/adv-def-text-dist.
Related papers
- Improving Adversarial Robustness via Decoupled Visual Representation Masking [65.73203518658224]
In this paper, we highlight two novel properties of robust features from the feature distribution perspective.
We find that state-of-the-art defense methods aim to address both of these mentioned issues well.
Specifically, we propose a simple but effective defense based on decoupled visual representation masking.
arXiv Detail & Related papers (2024-06-16T13:29:41Z) - Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks [62.036798488144306]
Current defense mainly focuses on the known attacks, but the adversarial robustness to the unknown attacks is seriously overlooked.
We propose an attack-agnostic defense method named Meta Invariance Defense (MID)
We show that MID simultaneously achieves robustness to the imperceptible adversarial perturbations in high-level image classification and attack-suppression in low-level robust image regeneration.
arXiv Detail & Related papers (2024-04-04T10:10:38Z) - Improving behavior based authentication against adversarial attack using XAI [3.340314613771868]
We propose an eXplainable AI (XAI) based defense strategy against adversarial attacks in such scenarios.
A feature selector, trained with our method, can be used as a filter in front of the original authenticator.
We demonstrate that our XAI based defense strategy is effective against adversarial attacks and outperforms other defense strategies.
arXiv Detail & Related papers (2024-02-26T09:29:05Z) - Robust Person Re-identification with Multi-Modal Joint Defence [1.441703014203756]
Existing work mainly relies on adversarial training for metric defense.
We propose targeted methods for metric attacks and defence methods.
In terms of metric defenses, we propose a joint defense method which includes two parts of proactive defense and passive defense.
arXiv Detail & Related papers (2021-11-18T08:13:49Z) - Adversarial Attack and Defense in Deep Ranking [100.17641539999055]
We propose two attacks against deep ranking systems that can raise or lower the rank of chosen candidates by adversarial perturbations.
Conversely, an anti-collapse triplet defense is proposed to improve the ranking model robustness against all proposed attacks.
Our adversarial ranking attacks and defenses are evaluated on MNIST, Fashion-MNIST, CUB200-2011, CARS196 and Stanford Online Products datasets.
arXiv Detail & Related papers (2021-06-07T13:41:45Z) - Internal Wasserstein Distance for Adversarial Attack and Defense [40.27647699862274]
We propose an internal Wasserstein distance (IWD) to measure image similarity between a sample and its adversarial example.
We develop a novel attack method by capturing the distribution of patches in original samples.
We also build a new defense method that seeks to learn robust models to defend against unseen adversarial examples.
arXiv Detail & Related papers (2021-03-13T02:08:02Z) - Robustness Out of the Box: Compositional Representations Naturally
Defend Against Black-Box Patch Attacks [11.429509031463892]
Patch-based adversarial attacks introduce a perceptible but localized change to the input that induces misclassification.
In this work, we study two different approaches for defending against black-box patch attacks.
We find that adversarial training has limited effectiveness against state-of-the-art location-optimized patch attacks.
arXiv Detail & Related papers (2020-12-01T15:04:23Z) - Are Adversarial Examples Created Equal? A Learnable Weighted Minimax
Risk for Robustness under Non-uniform Attacks [70.11599738647963]
Adversarial Training is one of the few defenses that withstand strong attacks.
Traditional defense mechanisms assume a uniform attack over the examples according to the underlying data distribution.
We present a weighted minimax risk optimization that defends against non-uniform attacks.
arXiv Detail & Related papers (2020-10-24T21:20:35Z) - A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems.
This paper proposes a self-supervised adversarial training mechanism in the input space.
It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z) - Deflecting Adversarial Attacks [94.85315681223702]
We present a new approach towards ending this cycle where we "deflect" adversarial attacks by causing the attacker to produce an input that resembles the attack's target class.
We first propose a stronger defense based on Capsule Networks that combines three detection mechanisms to achieve state-of-the-art detection performance.
arXiv Detail & Related papers (2020-02-18T06:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.