One Surrogate to Fool Them All: Universal, Transferable, and Targeted Adversarial Attacks with CLIP
- URL: http://arxiv.org/abs/2505.19840v2
- Date: Tue, 08 Jul 2025 03:14:54 GMT
- Title: One Surrogate to Fool Them All: Universal, Transferable, and Targeted Adversarial Attacks with CLIP
- Authors: Binyan Xu, Xilin Dai, Di Tang, Kehuan Zhang,
- Abstract summary: We present UnivIntruder, a novel attack framework that relies solely on a single, publicly available CLIP model and publicly available datasets.<n>Our experiments show that our approach achieves an Attack Success Rate (ASR) of up to 85% on ImageNet and over 99% on CIFAR-10, significantly outperforming existing transfer-based methods.
- Score: 8.41221824218595
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Neural Networks (DNNs) have achieved widespread success yet remain prone to adversarial attacks. Typically, such attacks either involve frequent queries to the target model or rely on surrogate models closely mirroring the target model -- often trained with subsets of the target model's training data -- to achieve high attack success rates through transferability. However, in realistic scenarios where training data is inaccessible and excessive queries can raise alarms, crafting adversarial examples becomes more challenging. In this paper, we present UnivIntruder, a novel attack framework that relies solely on a single, publicly available CLIP model and publicly available datasets. By using textual concepts, UnivIntruder generates universal, transferable, and targeted adversarial perturbations that mislead DNNs into misclassifying inputs into adversary-specified classes defined by textual concepts. Our extensive experiments show that our approach achieves an Attack Success Rate (ASR) of up to 85% on ImageNet and over 99% on CIFAR-10, significantly outperforming existing transfer-based methods. Additionally, we reveal real-world vulnerabilities, showing that even without querying target models, UnivIntruder compromises image search engines like Google and Baidu with ASR rates up to 84%, and vision language models like GPT-4 and Claude-3.5 with ASR rates up to 80%. These findings underscore the practicality of our attack in scenarios where traditional avenues are blocked, highlighting the need to reevaluate security paradigms in AI applications.
Related papers
- DUMB and DUMBer: Is Adversarial Training Worth It in the Real World? [15.469010487781931]
Adversarial examples are small and often imperceptible perturbations crafted to fool machine learning models.<n>Evasion attacks, a form of adversarial attack where input is modified at test time to cause misclassification, are particularly insidious due to their transferability.<n>We introduce DUMBer, an attack framework built on the foundation of the DUMB methodology to evaluate the resilience of adversarially trained models.
arXiv Detail & Related papers (2025-06-23T11:16:21Z) - Strategize Globally, Adapt Locally: A Multi-Turn Red Teaming Agent with Dual-Level Learning [39.931442440365444]
AlgName is a novel red-teaming agent that emulates sophisticated human attackers through complementary learning dimensions.<n>AlgName enables the agent to identify new jailbreak tactics, develop a goal-based tactic selection framework, and refine prompt formulations for selected tactics.<n> Empirical evaluations on JailbreakBench demonstrate our framework's superior performance, achieving over 90% attack success rates against GPT-3.5-Turbo and Llama-3.1-70B within 5 conversation turns.
arXiv Detail & Related papers (2025-04-02T01:06:19Z) - A Hybrid Defense Strategy for Boosting Adversarial Robustness in Vision-Language Models [9.304845676825584]
We propose a novel adversarial training framework that integrates multiple attack strategies and advanced machine learning techniques.
Experiments conducted on real-world datasets, including CIFAR-10 and CIFAR-100, demonstrate that the proposed method significantly enhances model robustness.
arXiv Detail & Related papers (2024-10-18T23:47:46Z) - Model Hijacking Attack in Federated Learning [19.304332176437363]
HijackFL is the first-of-its-kind hijacking attack against the global model in federated learning.
It aims to force the global model to perform a different task from its original task without the server or benign client noticing.
We conduct extensive experiments on four benchmark datasets and three popular models.
arXiv Detail & Related papers (2024-08-04T20:02:07Z) - MF-CLIP: Leveraging CLIP as Surrogate Models for No-box Adversarial Attacks [65.86360607693457]
No-box attacks, where adversaries have no prior knowledge, remain relatively underexplored despite its practical relevance.<n>This work presents a systematic investigation into leveraging large-scale Vision-Language Models (VLMs) as surrogate models for executing no-box attacks.<n>Our theoretical and empirical analyses reveal a key limitation in the execution of no-box attacks stemming from insufficient discriminative capabilities for direct application of vanilla CLIP as a surrogate model.<n>We propose MF-CLIP: a novel framework that enhances CLIP's effectiveness as a surrogate model through margin-aware feature space optimization.
arXiv Detail & Related papers (2023-07-13T08:10:48Z) - Avoid Adversarial Adaption in Federated Learning by Multi-Metric
Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources.
FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks.
We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously.
MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z) - Resisting Deep Learning Models Against Adversarial Attack
Transferability via Feature Randomization [17.756085566366167]
We propose a feature randomization-based approach that resists eight adversarial attacks targeting deep learning models.
Our methodology can secure the target network and resists adversarial attack transferability by over 60%.
arXiv Detail & Related papers (2022-09-11T20:14:12Z) - RelaxLoss: Defending Membership Inference Attacks without Losing Utility [68.48117818874155]
We propose a novel training framework based on a relaxed loss with a more achievable learning target.
RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead.
Our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs.
arXiv Detail & Related papers (2022-07-12T19:34:47Z) - Fabricated Flips: Poisoning Federated Learning without Data [9.060263645085564]
Attacks on Federated Learning (FL) can severely reduce the quality of the generated models.
We propose a data-free untargeted attack (DFA) that synthesizes malicious data to craft adversarial models.
DFA achieves similar or even higher attack success rate than state-of-the-art untargeted attacks.
arXiv Detail & Related papers (2022-02-07T20:38:28Z) - Interpolated Joint Space Adversarial Training for Robust and
Generalizable Defenses [82.3052187788609]
Adversarial training (AT) is considered to be one of the most reliable defenses against adversarial attacks.
Recent works show generalization improvement with adversarial samples under novel threat models.
We propose a novel threat model called Joint Space Threat Model (JSTM)
Under JSTM, we develop novel adversarial attacks and defenses.
arXiv Detail & Related papers (2021-12-12T21:08:14Z) - Universal Adversarial Training with Class-Wise Perturbations [78.05383266222285]
adversarial training is the most widely used method for defending against adversarial attacks.
In this work, we find that a UAP does not attack all classes equally.
We improve the SOTA UAT by proposing to utilize class-wise UAPs during adversarial training.
arXiv Detail & Related papers (2021-04-07T09:05:49Z) - Double Targeted Universal Adversarial Perturbations [83.60161052867534]
We introduce a double targeted universal adversarial perturbations (DT-UAPs) to bridge the gap between the instance-discriminative image-dependent perturbations and the generic universal perturbations.
We show the effectiveness of the proposed DTA algorithm on a wide range of datasets and also demonstrate its potential as a physical attack.
arXiv Detail & Related papers (2020-10-07T09:08:51Z) - Learning to Attack: Towards Textual Adversarial Attacking in Real-world
Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples.
We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.