Related papers: Replace-then-Perturb: Targeted Adversarial Attacks With Visual Reasoning for Vision-Language Models

Replace-then-Perturb: Targeted Adversarial Attacks With Visual Reasoning for Vision-Language Models

URL: http://arxiv.org/abs/2411.00898v1
Date: Fri, 01 Nov 2024 04:50:08 GMT
Title: Replace-then-Perturb: Targeted Adversarial Attacks With Visual Reasoning for Vision-Language Models
Authors: Jonggyu Jang, Hyeonsu Lyu, Jungyeon Koh, Hyun Jong Yang,
Abstract summary: We propose a novel adversarial attack procedure, namely, Replace-then-Perturb and Contrastive-Adv. In Replace-then-Perturb, we first leverage a text-guided segmentation model to find the target object in the image. By doing this, we can generate a target image corresponding to the desired prompt, while maintaining the overall integrity of the original image.
Score: 6.649753747542211
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The conventional targeted adversarial attacks add a small perturbation to an image to make neural network models estimate the image as a predefined target class, even if it is not the correct target class. Recently, for visual-language models (VLMs), the focus of targeted adversarial attacks is to generate a perturbation that makes VLMs answer intended target text outputs. For example, they aim to make a small perturbation on an image to make VLMs' answers change from "there is an apple" to "there is a baseball." However, answering just intended text outputs is insufficient for tricky questions like "if there is a baseball, tell me what is below it." This is because the target of the adversarial attacks does not consider the overall integrity of the original image, thereby leading to a lack of visual reasoning. In this work, we focus on generating targeted adversarial examples with visual reasoning against VLMs. To this end, we propose 1) a novel adversarial attack procedure -- namely, Replace-then-Perturb and 2) a contrastive learning-based adversarial loss -- namely, Contrastive-Adv. In Replace-then-Perturb, we first leverage a text-guided segmentation model to find the target object in the image. Then, we get rid of the target object and inpaint the empty space with the desired prompt. By doing this, we can generate a target image corresponding to the desired prompt, while maintaining the overall integrity of the original image. Furthermore, in Contrastive-Adv, we design a novel loss function to obtain better adversarial examples. Our extensive benchmark results demonstrate that Replace-then-Perturb and Contrastive-Adv outperform the baseline adversarial attack algorithms. We note that the source code to reproduce the results will be available.

Related papers

Web Artifact Attacks Disrupt Vision Language Models [61.59021920232986]
Vision-language models (VLMs) are trained on large-scale, lightly curated web datasets. They learn unintended correlations between semantic concepts and unrelated visual signals. Prior work has weaponized these correlations as an attack vector to manipulate model predictions. We introduce artifact-based attacks: a novel class of manipulations that mislead models using both non-matching text and graphical elements.
arXiv Detail & Related papers (2025-03-17T18:59:29Z)
Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models [29.1607388062023]
This paper focuses on a challenging scenario: decision-based black-box targeted attacks where the attackers only have access to the final output text and aim to perform targeted attacks. A three-stage process textitAsk, Attend, Attack, called textitAAA, is proposed to coordinate with the solver. Experimental results on transformer-based and CNN+RNN-based image-to-text models confirmed the effectiveness of our proposed textitAAA
arXiv Detail & Related papers (2024-08-16T19:35:06Z)
Unsegment Anything by Simulating Deformation [67.10966838805132]
"Anything Unsegmentable" is a task to grant any image "the right to be unsegmented" We aim to achieve transferable adversarial attacks against all prompt-based segmentation models. Our approach focuses on disrupting image encoder features to achieve prompt-agnostic attacks.
arXiv Detail & Related papers (2024-04-03T09:09:42Z)
VQAttack: Transferable Adversarial Attacks on Visual Question Answering via Pre-trained Models [58.21452697997078]
We propose a novel VQAttack model, which can generate both image and text perturbations with the designed modules. Experimental results on two VQA datasets with five validated models demonstrate the effectiveness of the proposed VQAttack.
arXiv Detail & Related papers (2024-02-16T21:17:42Z)
InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models [13.21813503235793]
Large vision-language models (LVLMs) have demonstrated their incredible capability in image understanding and response generation. In this paper, we formulate a novel and practical targeted attack scenario that the adversary can only know the vision encoder of the victim LVLM. We propose an instruction-tuned targeted attack (dubbed textscInstructTA) to deliver the targeted adversarial attack on LVLMs with high transferability.
arXiv Detail & Related papers (2023-12-04T13:40:05Z)
I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models [0.0]
We present a gray-box adversarial attack on image-to-text, both untargeted and targeted. Our attack operates in a gray-box manner, requiring no knowledge about the decoder module. We also show that our attacks fool the popular open-source platform Hugging Face.
arXiv Detail & Related papers (2023-06-13T07:35:28Z)
Object-fabrication Targeted Attack for Object Detection [54.10697546734503]
adversarial attack for object detection contains targeted attack and untargeted attack. New object-fabrication targeted attack mode can mislead detectors tofabricate extra false objects with specific target labels.
arXiv Detail & Related papers (2022-12-13T08:42:39Z)
Adversarial examples by perturbing high-level features in intermediate decoder layers [0.0]
Instead of perturbing pixels, we use an encoder-decoder representation of the input image and perturb intermediate layers in the decoder. Our perturbation possesses semantic meaning, such as a longer beak or green tints. We show that our method modifies key features such as edges and that defence techniques based on adversarial training are vulnerable to our attacks.
arXiv Detail & Related papers (2021-10-14T07:08:15Z)
Attack to Fool and Explain Deep Networks [59.97135687719244]
We counter-argue by providing evidence of human-meaningful patterns in adversarial perturbations. Our major contribution is a novel pragmatic adversarial attack that is subsequently transformed into a tool to interpret the visual models.
arXiv Detail & Related papers (2021-06-20T03:07:36Z)
On Generating Transferable Targeted Perturbations [102.3506210331038]
We propose a new generative approach for highly transferable targeted perturbations. Our approach matches the perturbed image distribution' with that of the target class, leading to high targeted transferability rates.
arXiv Detail & Related papers (2021-03-26T17:55:28Z)
Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data. We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level. Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.