A Simple Yet Efficient Method for Adversarial Word-Substitute Attack
- URL: http://arxiv.org/abs/2206.05015v1
- Date: Sat, 7 May 2022 14:20:57 GMT
- Title: A Simple Yet Efficient Method for Adversarial Word-Substitute Attack
- Authors: Tianle Li, Yi Yang
- Abstract summary: We propose a simple yet efficient method that can reduce the average number of adversarial queries by 3-30 times.
This research highlights that an adversary can fool a deep NLP model with much less cost.
- Score: 30.445201832698192
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: NLP researchers propose different word-substitute black-box attacks that can
fool text classification models. In such attack, an adversary keeps sending
crafted adversarial queries to the target model until it can successfully
achieve the intended outcome. State-of-the-art attack methods usually require
hundreds or thousands of queries to find one adversarial example. In this
paper, we study whether a sophisticated adversary can attack the system with
much less queries. We propose a simple yet efficient method that can reduce the
average number of adversarial queries by 3-30 times and maintain the attack
effectiveness. This research highlights that an adversary can fool a deep NLP
model with much less cost.
Related papers
- AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning [93.77763753231338]
Adversarial Contrastive Prompt Tuning (ACPT) is proposed to fine-tune the CLIP image encoder to extract similar embeddings for any two intermediate adversarial queries.
We show that ACPT can detect 7 state-of-the-art query-based attacks with $>99%$ detection rate within 5 shots.
We also show that ACPT is robust to 3 types of adaptive attacks.
arXiv Detail & Related papers (2024-08-04T09:53:50Z) - BruSLeAttack: A Query-Efficient Score-Based Black-Box Sparse Adversarial Attack [22.408968332454062]
We study the unique, less-well understood problem of generating sparse adversarial samples simply by observing the score-based replies to model queries.
We develop the BruSLeAttack-a new, faster (more query-efficient) algorithm for the problem.
Our work facilitates faster evaluation of model vulnerabilities and raises our vigilance on the safety, security and reliability of deployed systems.
arXiv Detail & Related papers (2024-04-08T08:59:26Z) - ASETF: A Novel Method for Jailbreak Attack on LLMs through Translate Suffix Embeddings [58.82536530615557]
We propose an Adversarial Suffix Embedding Translation Framework (ASETF) to transform continuous adversarial suffix embeddings into coherent and understandable text.
Our method significantly reduces the computation time of adversarial suffixes and achieves a much better attack success rate to existing techniques.
arXiv Detail & Related papers (2024-02-25T06:46:27Z) - Among Us: Adversarially Robust Collaborative Perception by Consensus [50.73128191202585]
Multiple robots could perceive a scene (e.g., detect objects) collaboratively better than individuals.
We propose ROBOSAC, a novel sampling-based defense strategy generalizable to unseen attackers.
We validate our method on the task of collaborative 3D object detection in autonomous driving scenarios.
arXiv Detail & Related papers (2023-03-16T17:15:25Z) - Understanding the Vulnerability of Skeleton-based Human Activity Recognition via Black-box Attack [53.032801921915436]
Human Activity Recognition (HAR) has been employed in a wide range of applications, e.g. self-driving cars.
Recently, the robustness of skeleton-based HAR methods have been questioned due to their vulnerability to adversarial attacks.
We show such threats exist, even when the attacker only has access to the input/output of the model.
We propose the very first black-box adversarial attack approach in skeleton-based HAR called BASAR.
arXiv Detail & Related papers (2022-11-21T09:51:28Z) - Practical Evaluation of Adversarial Robustness via Adaptive Auto Attack [96.50202709922698]
A practical evaluation method should be convenient (i.e., parameter-free), efficient (i.e., fewer iterations) and reliable.
We propose a parameter-free Adaptive Auto Attack (A$3$) evaluation method which addresses the efficiency and reliability in a test-time-training fashion.
arXiv Detail & Related papers (2022-03-10T04:53:54Z) - Generating Natural Language Adversarial Examples through An Improved
Beam Search Algorithm [0.5735035463793008]
In this paper, a novel attack model is proposed, its attack success rate surpasses the benchmark attack methods.
The novel method is empirically evaluated by attacking WordCNN, LSTM, BiLSTM, and BERT on four benchmark datasets.
It achieves a 100% attack success rate higher than the state-of-the-art method when attacking BERT and BiLSTM on IMDB.
arXiv Detail & Related papers (2021-10-15T12:09:04Z) - A Strong Baseline for Query Efficient Attacks in a Black Box Setting [3.52359746858894]
We propose a query efficient attack strategy to generate plausible adversarial examples on text classification and entailment tasks.
Our attack jointly leverages attention mechanism and locality sensitive hashing (LSH) to reduce the query count.
arXiv Detail & Related papers (2021-09-10T10:46:32Z) - Learning to Detect Adversarial Examples Based on Class Scores [0.8411385346896413]
We take a closer look at adversarial attack detection based on the class scores of an already trained classification model.
We propose to train a support vector machine (SVM) on the class scores to detect adversarial examples.
We show that our approach yields an improved detection rate compared to an existing method, whilst being easy to implement.
arXiv Detail & Related papers (2021-07-09T13:29:54Z) - Practical No-box Adversarial Attacks against DNNs [31.808770437120536]
We investigate no-box adversarial examples, where the attacker can neither access the model information or the training set nor query the model.
We propose three mechanisms for training with a very small dataset and find that prototypical reconstruction is the most effective.
Our approach significantly diminishes the average prediction accuracy of the system to only 15.40%, which is on par with the attack that transfers adversarial examples from a pre-trained Arcface model.
arXiv Detail & Related papers (2020-12-04T11:10:03Z) - AdvMind: Inferring Adversary Intent of Black-Box Attacks [66.19339307119232]
We present AdvMind, a new class of estimation models that infer the adversary intent of black-box adversarial attacks in a robust manner.
On average AdvMind detects the adversary intent with over 75% accuracy after observing less than 3 query batches.
arXiv Detail & Related papers (2020-06-16T22:04:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.