Related papers: A Generative Adversarial Approach to Adversarial Attacks Guided by Contrastive Language-Image Pre-trained Model

A Generative Adversarial Approach to Adversarial Attacks Guided by Contrastive Language-Image Pre-trained Model

URL: http://arxiv.org/abs/2511.01317v1
Date: Mon, 03 Nov 2025 08:02:48 GMT
Title: A Generative Adversarial Approach to Adversarial Attacks Guided by Contrastive Language-Image Pre-trained Model
Authors: Sampriti Soor, Alik Pramanick, Jothiprakash K, Arijit Sur,
Abstract summary: A generative adversarial attack method is proposed that uses the CLIP model to create highly effective and visually imperceptible adversarial perturbations.<n>Our approach integrates the concentrated perturbation strategy from Saliency-based Auto-Encoder with the dissimilar text embeddings similar to Generative Adversarial Multi-Object Scene Attacks (GAMA)
Score: 12.15621649989295
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The rapid growth of deep learning has brought about powerful models that can handle various tasks, like identifying images and understanding language. However, adversarial attacks, an unnoticed alteration, can deceive models, leading to inaccurate predictions. In this paper, a generative adversarial attack method is proposed that uses the CLIP model to create highly effective and visually imperceptible adversarial perturbations. The CLIP model's ability to align text and image representation helps incorporate natural language semantics with a guided loss to generate effective adversarial examples that look identical to the original inputs. This integration allows extensive scene manipulation, creating perturbations in multi-object environments specifically designed to deceive multilabel classifiers. Our approach integrates the concentrated perturbation strategy from Saliency-based Auto-Encoder (SSAE) with the dissimilar text embeddings similar to Generative Adversarial Multi-Object Scene Attacks (GAMA), resulting in perturbations that both deceive classification models and maintain high structural similarity to the original images. The model was tested on various tasks across diverse black-box victim models. The experimental results show that our method performs competitively, achieving comparable or superior results to existing techniques, while preserving greater visual fidelity.

Related papers

Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model [118.52589065972795]
We introduce Muddit, a unified discrete diffusion transformer that enables fast and parallel generation across both text and image modalities.<n>Unlike prior unified diffusion models trained from scratch, Muddit integrates strong visual priors from a pretrained text-to-image backbone with a lightweight text decoder.
arXiv Detail & Related papers (2025-05-29T16:15:48Z)
Robust image classification with multi-modal large language models [4.709926629434273]
adversarial examples can cause Deep Neural Networks to make incorrect predictions with high confidence.<n>To mitigate these vulnerabilities, adversarial training and detection-based defenses have been proposed to strengthen models in advance.<n>We propose a novel defense, MultiShield, designed to combine and complement these defenses with multi-modal information.
arXiv Detail & Related papers (2024-12-13T18:49:25Z)
Natural Language Induced Adversarial Images [14.415478695871604]
We propose a natural language induced adversarial image attack method. The core idea is to leverage a text-to-image model to generate adversarial images given input prompts. In experiments, we found that some high-frequency semantic information such as "foggy", "humid", "stretching" can easily cause errors.
arXiv Detail & Related papers (2024-10-11T08:36:07Z)
Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks. We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z)
DiffuseDef: Improved Robustness to Adversarial Attacks via Iterative Denoising [35.10201243366131]
DiffuseDef is a novel adversarial defense method for language classification tasks.<n>It incorporates a diffusion layer as a denoiser between the encoder and the classifier.<n>It achieves state-of-the-art performance against common black-box and white-box adversarial attacks.
arXiv Detail & Related papers (2024-06-28T22:36:17Z)
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models. Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs. Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z)
Diverse and Tailored Image Generation for Zero-shot Multi-label Classification [3.354528906571718]
zero-shot multi-label classification has garnered considerable attention for its capacity to operate predictions on unseen labels without human annotations. prevailing approaches often use seen classes as imperfect proxies for unseen ones, resulting in suboptimal performance. We propose an innovative solution: generating synthetic data to construct a training set explicitly tailored for proxyless training on unseen labels.
arXiv Detail & Related papers (2024-04-04T01:34:36Z)
SA-Attack: Improving Adversarial Transferability of Vision-Language Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios. We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z)
Counterfactual Image Generation for adversarially robust and interpretable Classifiers [1.3859669037499769]
We propose a unified framework leveraging image-to-image translation Generative Adrial Networks (GANs) to produce counterfactual samples. This is achieved by combining the classifier and discriminator into a single model that attributes real images to their respective classes and flags generated images as "fake" We show how the model exhibits improved robustness to adversarial attacks, and we show how the discriminator's "fakeness" value serves as an uncertainty measure of the predictions.
arXiv Detail & Related papers (2023-10-01T18:50:29Z)
GAMA: Generative Adversarial Multi-Object Scene Attacks [48.33120361498787]
This paper presents the first approach of using generative models for adversarial attacks on multi-object scenes. We call this attack approach Generative Adversarial Multi-object scene Attacks (GAMA)
arXiv Detail & Related papers (2022-09-20T06:40:54Z)
Stylized Adversarial Defense [105.88250594033053]
adversarial training creates perturbation patterns and includes them in the training set to robustify the model. We propose to exploit additional information from the feature space to craft stronger adversaries. Our adversarial training approach demonstrates strong robustness compared to state-of-the-art defenses.
arXiv Detail & Related papers (2020-07-29T08:38:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.