AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning
- URL: http://arxiv.org/abs/2408.01978v1
- Date: Sun, 4 Aug 2024 09:53:50 GMT
- Title: AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning
- Authors: Xin Wang, Kai Chen, Xingjun Ma, Zhineng Chen, Jingjing Chen, Yu-Gang Jiang,
- Abstract summary: Adversarial Contrastive Prompt Tuning (ACPT) is proposed to fine-tune the CLIP image encoder to extract similar embeddings for any two intermediate adversarial queries.
We show that ACPT can detect 7 state-of-the-art query-based attacks with $>99%$ detection rate within 5 shots.
We also show that ACPT is robust to 3 types of adaptive attacks.
- Score: 93.77763753231338
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) are known to be vulnerable to adversarial attacks even under a black-box setting where the adversary can only query the model. Particularly, query-based black-box adversarial attacks estimate adversarial gradients based on the returned probability vectors of the target model for a sequence of queries. During this process, the queries made to the target model are intermediate adversarial examples crafted at the previous attack step, which share high similarities in the pixel space. Motivated by this observation, stateful detection methods have been proposed to detect and reject query-based attacks. While demonstrating promising results, these methods either have been evaded by more advanced attacks or suffer from low efficiency in terms of the number of shots (queries) required to detect different attacks. Arguably, the key challenge here is to assign high similarity scores for any two intermediate adversarial examples perturbed from the same clean image. To address this challenge, we propose a novel Adversarial Contrastive Prompt Tuning (ACPT) method to robustly fine-tune the CLIP image encoder to extract similar embeddings for any two intermediate adversarial queries. With ACPT, we further introduce a detection framework AdvQDet that can detect 7 state-of-the-art query-based attacks with $>99\%$ detection rate within 5 shots. We also show that ACPT is robust to 3 types of adaptive attacks. Code is available at https://github.com/xinwong/AdvQDet.
Related papers
- PRAT: PRofiling Adversarial aTtacks [52.693011665938734]
We introduce a novel problem of PRofiling Adversarial aTtacks (PRAT)
Given an adversarial example, the objective of PRAT is to identify the attack used to generate it.
We use AID to devise a novel framework for the PRAT objective.
arXiv Detail & Related papers (2023-09-20T07:42:51Z) - Zero-Query Transfer Attacks on Context-Aware Object Detectors [95.18656036716972]
Adversarial attacks perturb images such that a deep neural network produces incorrect classification results.
A promising approach to defend against adversarial attacks on natural multi-object scenes is to impose a context-consistency check.
We present the first approach for generating context-consistent adversarial attacks that can evade the context-consistency check.
arXiv Detail & Related papers (2022-03-29T04:33:06Z) - Generative Adversarial Network-Driven Detection of Adversarial Tasks in
Mobile Crowdsensing [5.675436513661266]
Crowdsensing systems are vulnerable to various attacks as they build on non-dedicated and ubiquitous properties.
Previous works suggest that GAN-based attacks exhibit more crucial devastation than empirically designed attack samples.
This paper aims to detect intelligently designed illegitimate sensing service requests by integrating a GAN-based model.
arXiv Detail & Related papers (2022-02-16T00:23:25Z) - RamBoAttack: A Robust Query Efficient Deep Neural Network Decision
Exploit [9.93052896330371]
We develop a robust query efficient attack capable of avoiding entrapment in a local minimum and misdirection from noisy gradients.
The RamBoAttack is more robust to the different sample inputs available to an adversary and the targeted class.
arXiv Detail & Related papers (2021-12-10T01:25:24Z) - ADC: Adversarial attacks against object Detection that evade Context
consistency checks [55.8459119462263]
We show that even context consistency checks can be brittle to properly crafted adversarial examples.
We propose an adaptive framework to generate examples that subvert such defenses.
Our results suggest that how to robustly model context and check its consistency, is still an open problem.
arXiv Detail & Related papers (2021-10-24T00:25:09Z) - Multi-Expert Adversarial Attack Detection in Person Re-identification
Using Context Inconsistency [47.719533482898306]
We propose a Multi-Expert Adversarial Attack Detection (MEAAD) approach to detect malicious attacks on person re-identification (ReID) systems.
As the first adversarial attack detection approach for ReID,MEAADeffectively detects various adversarial at-tacks and achieves high ROC-AUC (over 97.5%).
arXiv Detail & Related papers (2021-08-23T01:59:09Z) - Using Anomaly Feature Vectors for Detecting, Classifying and Warning of
Outlier Adversarial Examples [4.096598295525345]
We present DeClaW, a system for detecting, classifying, and warning of adversarial inputs presented to a classification neural network.
Preliminary findings suggest that AFVs can help distinguish among several types of adversarial attacks with close to 93% accuracy on the CIFAR-10 dataset.
arXiv Detail & Related papers (2021-07-01T16:00:09Z) - ExAD: An Ensemble Approach for Explanation-based Adversarial Detection [17.455233006559734]
We propose ExAD, a framework to detect adversarial examples using an ensemble of explanation techniques.
We evaluate our approach using six state-of-the-art adversarial attacks on three image datasets.
arXiv Detail & Related papers (2021-03-22T00:53:07Z) - QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval [56.51916317628536]
We study the query-based attack against image retrieval to evaluate its robustness against adversarial examples under the black-box setting.
A new relevance-based loss is designed to quantify the attack effects by measuring the set similarity on the top-k retrieval results before and after attacks.
Experiments show that the proposed attack achieves a high attack success rate with few queries against the image retrieval systems under the black-box setting.
arXiv Detail & Related papers (2021-03-04T10:18:43Z) - AdvMind: Inferring Adversary Intent of Black-Box Attacks [66.19339307119232]
We present AdvMind, a new class of estimation models that infer the adversary intent of black-box adversarial attacks in a robust manner.
On average AdvMind detects the adversary intent with over 75% accuracy after observing less than 3 query batches.
arXiv Detail & Related papers (2020-06-16T22:04:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.