Cheating Suffix: Targeted Attack to Text-To-Image Diffusion Models with
Multi-Modal Priors
- URL: http://arxiv.org/abs/2402.01369v1
- Date: Fri, 2 Feb 2024 12:39:49 GMT
- Title: Cheating Suffix: Targeted Attack to Text-To-Image Diffusion Models with
Multi-Modal Priors
- Authors: Dingcheng Yang, Yang Bai, Xiaojun Jia, Yang Liu, Xiaochun Cao, Wenjian
Yu
- Abstract summary: Diffusion models have been widely deployed in various image generation tasks.
They face challenges of being maliciously exploited to generate harmful or sensitive images.
We propose a targeted attack method named MMP-Attack.
- Score: 59.43303903348258
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models have been widely deployed in various image generation tasks,
demonstrating an extraordinary connection between image and text modalities.
However, they face challenges of being maliciously exploited to generate
harmful or sensitive images by appending a specific suffix to the original
prompt. Existing works mainly focus on using single-modal information to
conduct attacks, which fails to utilize multi-modal features and results in
less than satisfactory performance. Integrating multi-modal priors (MMP), i.e.
both text and image features, we propose a targeted attack method named
MMP-Attack in this work. Specifically, the goal of MMP-Attack is to add a
target object into the image content while simultaneously removing the original
object. The MMP-Attack shows a notable advantage over existing works with
superior universality and transferability, which can effectively attack
commercial text-to-image (T2I) models such as DALL-E 3. To the best of our
knowledge, this marks the first successful attempt of transfer-based attack to
commercial T2I models. Our code is publicly available at
\url{https://github.com/ydc123/MMP-Attack}.
Related papers
- Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation [54.96563068182733]
We propose Modality Adaptation with text-to-image Diffusion Models (MADM) for semantic segmentation task.
MADM utilizes text-to-image diffusion models pre-trained on extensive image-text pairs to enhance the model's cross-modality capabilities.
We show that MADM achieves state-of-the-art adaptation performance across various modality tasks, including images to depth, infrared, and event modalities.
arXiv Detail & Related papers (2024-10-29T03:49:40Z) - AnyAttack: Towards Large-scale Self-supervised Generation of Targeted Adversarial Examples for Vision-Language Models [41.044385916368455]
Vision-Language Models (VLMs) are vulnerable to image-based adversarial attacks.
We propose AnyAttack, a self-supervised framework that generates targeted adversarial images for VLMs without label supervision.
arXiv Detail & Related papers (2024-10-07T09:45:18Z) - RT-Attack: Jailbreaking Text-to-Image Models via Random Token [24.61198605177661]
We introduce a two-stage query-based black-box attack method utilizing random search.
In the first stage, we establish a preliminary prompt by maximizing the semantic similarity between the adversarial and target harmful prompts.
In the second stage, we use this initial prompt to refine our approach, creating a detailed adversarial prompt aimed at jailbreaking.
arXiv Detail & Related papers (2024-08-25T17:33:40Z) - White-box Multimodal Jailbreaks Against Large Vision-Language Models [61.97578116584653]
We propose a more comprehensive strategy that jointly attacks both text and image modalities to exploit a broader spectrum of vulnerability within Large Vision-Language Models.
Our attack method begins by optimizing an adversarial image prefix from random noise to generate diverse harmful responses in the absence of text input.
An adversarial text suffix is integrated and co-optimized with the adversarial image prefix to maximize the probability of eliciting affirmative responses to various harmful instructions.
arXiv Detail & Related papers (2024-05-28T07:13:30Z) - MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation [22.69019130782004]
We present MoMA: an open-vocabulary, training-free personalized image model that boasts flexible zero-shot capabilities.
We train MoMA to serve a dual role as both a feature extractor and a generator.
We introduce a novel self-attention shortcut method that efficiently transfers image features to an image diffusion model.
arXiv Detail & Related papers (2024-04-08T16:55:49Z) - VQAttack: Transferable Adversarial Attacks on Visual Question Answering
via Pre-trained Models [58.21452697997078]
We propose a novel VQAttack model, which can generate both image and text perturbations with the designed modules.
Experimental results on two VQA datasets with five validated models demonstrate the effectiveness of the proposed VQAttack.
arXiv Detail & Related papers (2024-02-16T21:17:42Z) - Fooling Contrastive Language-Image Pre-trained Models with CLIPMasterPrints [15.643898659673036]
We show that despite their versatility, CLIP models are vulnerable to what we refer to as fooling master images.
Fooling master images are capable of maximizing the confidence score of a CLIP model for a significant number of widely varying prompts.
We demonstrate how fooling master images for CLIPMasterPrints can be mined using gradient descent, projected descent, or blackbox optimization.
arXiv Detail & Related papers (2023-07-07T18:54:11Z) - GAMA: Generative Adversarial Multi-Object Scene Attacks [48.33120361498787]
This paper presents the first approach of using generative models for adversarial attacks on multi-object scenes.
We call this attack approach Generative Adversarial Multi-object scene Attacks (GAMA)
arXiv Detail & Related papers (2022-09-20T06:40:54Z) - Dual Manifold Adversarial Robustness: Defense against Lp and non-Lp
Adversarial Attacks [154.31827097264264]
Adversarial training is a popular defense strategy against attack threat models with bounded Lp norms.
We propose Dual Manifold Adversarial Training (DMAT) where adversarial perturbations in both latent and image spaces are used in robustifying the model.
Our DMAT improves performance on normal images, and achieves comparable robustness to the standard adversarial training against Lp attacks.
arXiv Detail & Related papers (2020-09-05T06:00:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.