Related papers: Grey-box Adversarial Attack And Defence For Sentiment Classification

Grey-box Adversarial Attack And Defence For Sentiment Classification

URL: http://arxiv.org/abs/2103.11576v1
Date: Mon, 22 Mar 2021 04:05:17 GMT
Title: Grey-box Adversarial Attack And Defence For Sentiment Classification
Authors: Ying Xu, Xu Zhong, Antonio Jimeno Yepes, Jey Han Lau
Abstract summary: We introduce a grey-box adversarial attack and defence framework for sentiment classification. We address the issues of differentiability, label preservation and input reconstruction for adversarial attack and defence in one unified framework.
Score: 19.466940655682727
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce a grey-box adversarial attack and defence framework for sentiment classification. We address the issues of differentiability, label preservation and input reconstruction for adversarial attack and defence in one unified framework. Our results show that once trained, the attacking model is capable of generating high-quality adversarial examples substantially faster (one order of magnitude less in time) than state-of-the-art attacking methods. These examples also preserve the original sentiment according to human evaluation. Additionally, our framework produces an improved classifier that is robust in defending against multiple adversarial attacking methods. Code is available at: https://github.com/ibm-aur-nlp/adv-def-text-dist.

Related papers

Mind the Gap: Detecting Black-box Adversarial Attacks in the Making through Query Update Analysis [3.795071937009966]
Adrial attacks can jeopardize the integrity of Machine Learning (ML) models. We propose a framework that detects if an adversarial noise instance is being generated. We evaluate our approach against 8 state-of-the-art attacks, including adaptive attacks.
arXiv Detail & Related papers (2025-03-04T20:25:12Z)
Improving Adversarial Robustness via Decoupled Visual Representation Masking [65.73203518658224]
In this paper, we highlight two novel properties of robust features from the feature distribution perspective. We find that state-of-the-art defense methods aim to address both of these mentioned issues well. Specifically, we propose a simple but effective defense based on decoupled visual representation masking.
arXiv Detail & Related papers (2024-06-16T13:29:41Z)
Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks [62.036798488144306]
Current defense mainly focuses on the known attacks, but the adversarial robustness to the unknown attacks is seriously overlooked. We propose an attack-agnostic defense method named Meta Invariance Defense (MID) We show that MID simultaneously achieves robustness to the imperceptible adversarial perturbations in high-level image classification and attack-suppression in low-level robust image regeneration.
arXiv Detail & Related papers (2024-04-04T10:10:38Z)
Improving behavior based authentication against adversarial attack using XAI [3.340314613771868]
We propose an eXplainable AI (XAI) based defense strategy against adversarial attacks in such scenarios. A feature selector, trained with our method, can be used as a filter in front of the original authenticator. We demonstrate that our XAI based defense strategy is effective against adversarial attacks and outperforms other defense strategies.
arXiv Detail & Related papers (2024-02-26T09:29:05Z)
Robust Person Re-identification with Multi-Modal Joint Defence [1.441703014203756]
Existing work mainly relies on adversarial training for metric defense. We propose targeted methods for metric attacks and defence methods. In terms of metric defenses, we propose a joint defense method which includes two parts of proactive defense and passive defense.
arXiv Detail & Related papers (2021-11-18T08:13:49Z)
Adversarial Attack and Defense in Deep Ranking [100.17641539999055]
We propose two attacks against deep ranking systems that can raise or lower the rank of chosen candidates by adversarial perturbations. Conversely, an anti-collapse triplet defense is proposed to improve the ranking model robustness against all proposed attacks. Our adversarial ranking attacks and defenses are evaluated on MNIST, Fashion-MNIST, CUB200-2011, CARS196 and Stanford Online Products datasets.
arXiv Detail & Related papers (2021-06-07T13:41:45Z)
Internal Wasserstein Distance for Adversarial Attack and Defense [40.27647699862274]
We propose an internal Wasserstein distance (IWD) to measure image similarity between a sample and its adversarial example. We develop a novel attack method by capturing the distribution of patches in original samples. We also build a new defense method that seeks to learn robust models to defend against unseen adversarial examples.
arXiv Detail & Related papers (2021-03-13T02:08:02Z)
Robustness Out of the Box: Compositional Representations Naturally Defend Against Black-Box Patch Attacks [11.429509031463892]
Patch-based adversarial attacks introduce a perceptible but localized change to the input that induces misclassification. In this work, we study two different approaches for defending against black-box patch attacks. We find that adversarial training has limited effectiveness against state-of-the-art location-optimized patch attacks.
arXiv Detail & Related papers (2020-12-01T15:04:23Z)
Are Adversarial Examples Created Equal? A Learnable Weighted Minimax Risk for Robustness under Non-uniform Attacks [70.11599738647963]
Adversarial Training is one of the few defenses that withstand strong attacks. Traditional defense mechanisms assume a uniform attack over the examples according to the underlying data distribution. We present a weighted minimax risk optimization that defends against non-uniform attacks.
arXiv Detail & Related papers (2020-10-24T21:20:35Z)
A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems. This paper proposes a self-supervised adversarial training mechanism in the input space. It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z)
Deflecting Adversarial Attacks [94.85315681223702]
We present a new approach towards ending this cycle where we "deflect" adversarial attacks by causing the attacker to produce an input that resembles the attack's target class. We first propose a stronger defense based on Capsule Networks that combines three detection mechanisms to achieve state-of-the-art detection performance.
arXiv Detail & Related papers (2020-02-18T06:59:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.