Multi-task Adversarial Attacks against Black-box Model with Few-shot Queries
- URL: http://arxiv.org/abs/2508.10039v1
- Date: Sun, 10 Aug 2025 12:46:47 GMT
- Title: Multi-task Adversarial Attacks against Black-box Model with Few-shot Queries
- Authors: Wenqiang Wang, Yan Xiao, Hao Lin, Yangshijie Zhang, Xiaochun Cao,
- Abstract summary: We propose textbfCluster and textbfEnsemble textbfMulti-task Text Adrial textbfAttack (textbfCEMA), an effective black-box attack that exploits the transferability of adversarial texts across different tasks.<n>CEMA simplifies complex multi-task scenarios by using a textitdeep-level substitute model trained in a textitplug-and-play manner for text classification, enabling attacks without mimicking the victim model.
- Score: 45.109208326366605
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Current multi-task adversarial text attacks rely on abundant access to shared internal features and numerous queries, often limited to a single task type. As a result, these attacks are less effective against practical scenarios involving black-box feedback APIs, limited queries, or multiple task types. To bridge this gap, we propose \textbf{C}luster and \textbf{E}nsemble \textbf{M}ulti-task Text Adversarial \textbf{A}ttack (\textbf{CEMA}), an effective black-box attack that exploits the transferability of adversarial texts across different tasks. CEMA simplifies complex multi-task scenarios by using a \textit{deep-level substitute model} trained in a \textit{plug-and-play} manner for text classification, enabling attacks without mimicking the victim model. This approach requires only a few queries for training, converting multi-task attacks into classification attacks and allowing attacks across various tasks. CEMA generates multiple adversarial candidates using different text classification methods and selects the one that most effectively attacks substitute models. In experiments involving multi-task models with two, three, or six tasks--spanning classification, translation, summarization, and text-to-image generation--CEMA demonstrates significant attack success with as few as 100 queries. Furthermore, CEMA can target commercial APIs (e.g., Baidu and Google Translate), large language models (e.g., ChatGPT 4o), and image-generation models (e.g., Stable Diffusion V2), showcasing its versatility and effectiveness in real-world applications.
Related papers
- Bridging the Task Gap: Multi-Task Adversarial Transferability in CLIP and Its Derivatives [61.58574200236532]
Adversarial examples generated from fine-grained tasks often exhibit stronger transfer potential than those from coarse-grained tasks.<n>We propose a novel framework, Multi-Task Adversarial CLIP (MT-AdvCLIP), which introduces a task-aware feature aggregation loss and generates perturbations with enhanced cross-task generalization capability.
arXiv Detail & Related papers (2025-09-28T14:46:52Z) - Benchmarking Unified Face Attack Detection via Hierarchical Prompt Tuning [58.16354555208417]
PAD and FFD are proposed to protect face data from physical media-based Presentation Attacks and digital editing-based DeepFakes, respectively.<n>The lack of a Unified Face Attack Detection model to simultaneously handle attacks in these two categories is mainly attributed to two factors.<n>We present a novel Visual-Language Model-based Hierarchical Prompt Tuning Framework that adaptively explores multiple classification criteria from different semantic spaces.
arXiv Detail & Related papers (2025-05-19T16:35:45Z) - Reason2Attack: Jailbreaking Text-to-Image Models via LLM Reasoning [34.73320827764541]
Text-to-Image(T2I) models typically deploy safety filters to prevent the generation of sensitive images.<n>Recent jailbreaking attack methods manually design prompts for the LLM to generate adversarial prompts.<n>We propose Reason2Attack(R2A), which aims to enhance the LLM's reasoning capabilities in generating adversarial prompts.
arXiv Detail & Related papers (2025-03-23T08:40:39Z) - `Do as I say not as I do': A Semi-Automated Approach for Jailbreak Prompt Attack against Multimodal LLMs [33.49407213040455]
We introduce the first voice-based jailbreak attack against multimodal large language models (LLMs)<n>We propose a novel strategy, in which the disallowed prompt is flanked by benign, narrative-driven prompts.<n>We demonstrate that Flanking Attack is capable of manipulating state-of-the-art LLMs into generating misaligned and forbidden outputs.
arXiv Detail & Related papers (2025-02-02T10:05:08Z) - Continuous Multi-Task Pre-training for Malicious URL Detection and Webpage Classification [6.8847203112253235]
Malicious URL detection and webpage classification are critical tasks in cybersecurity and information management.<n>We propose urlBERT, a pre-trained URL encoder leveraging Transformer to encode foundational knowledge from billions of unlabeled URLs.<n>We evaluate it on three downstream tasks: phishing URL detection, advertising URL detection, and webpage classification.
arXiv Detail & Related papers (2024-02-18T07:51:20Z) - Universal and Transferable Adversarial Attacks on Aligned Language
Models [118.41733208825278]
We propose a simple and effective attack method that causes aligned language models to generate objectionable behaviors.
Surprisingly, we find that the adversarial prompts generated by our approach are quite transferable.
arXiv Detail & Related papers (2023-07-27T17:49:12Z) - Multi-Task Models Adversarial Attacks [25.834775498006657]
Multi-Task Learning involves developing a singular model, known as a multi-task model, to concurrently perform multiple tasks.
The security of single-task models has been thoroughly studied, but multi-task models pose several critical security questions.
This paper addresses these queries through detailed analysis and rigorous experimentation.
arXiv Detail & Related papers (2023-05-20T03:07:43Z) - Two-in-One: A Model Hijacking Attack Against Text Generation Models [19.826236952700256]
We propose a new model hijacking attack, Ditto, that can hijack different text classification tasks into multiple generation ones.
Our results show that by using Ditto, an adversary can successfully hijack text generation models without jeopardizing their utility.
arXiv Detail & Related papers (2023-05-12T12:13:27Z) - Generalizable Black-Box Adversarial Attack with Meta Learning [54.196613395045595]
In black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful perturbation based on query feedback under a query budget.
We propose to utilize the feedback information across historical attacks, dubbed example-level adversarial transferability.
The proposed framework with the two types of adversarial transferability can be naturally combined with any off-the-shelf query-based attack methods to boost their performance.
arXiv Detail & Related papers (2023-01-01T07:24:12Z) - Multi-Modal Few-Shot Temporal Action Detection [157.96194484236483]
Few-shot (FS) and zero-shot (ZS) learning are two different approaches for scaling temporal action detection to new classes.
We introduce a new multi-modality few-shot (MMFS) TAD problem, which can be considered as a marriage of FS-TAD and ZS-TAD.
arXiv Detail & Related papers (2022-11-27T18:13:05Z) - Towards Variable-Length Textual Adversarial Attacks [68.27995111870712]
It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data.
In this paper, we propose variable-length textual adversarial attacks(VL-Attack)
Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
arXiv Detail & Related papers (2021-04-16T14:37:27Z) - Multi-Task Adversarial Attack [3.412750324146571]
Multi-Task adversarial Attack (MTA) is a unified framework that can craft adversarial examples for multiple tasks efficiently.
MTA uses a generator for adversarial perturbations which consists of a shared encoder for all tasks and multiple task-specific decoders.
Thanks to the shared encoder, MTA reduces the storage cost and speeds up the inference when attacking multiple tasks simultaneously.
arXiv Detail & Related papers (2020-11-19T13:56:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.