Cheating Suffix: Targeted Attack to Text-To-Image Diffusion Models with
Multi-Modal Priors
- URL: http://arxiv.org/abs/2402.01369v1
- Date: Fri, 2 Feb 2024 12:39:49 GMT
- Title: Cheating Suffix: Targeted Attack to Text-To-Image Diffusion Models with
Multi-Modal Priors
- Authors: Dingcheng Yang, Yang Bai, Xiaojun Jia, Yang Liu, Xiaochun Cao, Wenjian
Yu
- Abstract summary: Diffusion models have been widely deployed in various image generation tasks.
They face challenges of being maliciously exploited to generate harmful or sensitive images.
We propose a targeted attack method named MMP-Attack.
- Score: 59.43303903348258
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models have been widely deployed in various image generation tasks,
demonstrating an extraordinary connection between image and text modalities.
However, they face challenges of being maliciously exploited to generate
harmful or sensitive images by appending a specific suffix to the original
prompt. Existing works mainly focus on using single-modal information to
conduct attacks, which fails to utilize multi-modal features and results in
less than satisfactory performance. Integrating multi-modal priors (MMP), i.e.
both text and image features, we propose a targeted attack method named
MMP-Attack in this work. Specifically, the goal of MMP-Attack is to add a
target object into the image content while simultaneously removing the original
object. The MMP-Attack shows a notable advantage over existing works with
superior universality and transferability, which can effectively attack
commercial text-to-image (T2I) models such as DALL-E 3. To the best of our
knowledge, this marks the first successful attempt of transfer-based attack to
commercial T2I models. Our code is publicly available at
\url{https://github.com/ydc123/MMP-Attack}.
Related papers
- White-box Multimodal Jailbreaks Against Large Vision-Language Models [61.97578116584653]
We propose a more comprehensive strategy that jointly attacks both text and image modalities to exploit a broader spectrum of vulnerability within Large Vision-Language Models.
Our attack method begins by optimizing an adversarial image prefix from random noise to generate diverse harmful responses in the absence of text input.
An adversarial text suffix is integrated and co-optimized with the adversarial image prefix to maximize the probability of eliciting affirmative responses to various harmful instructions.
arXiv Detail & Related papers (2024-05-28T07:13:30Z) - MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation [22.69019130782004]
We present MoMA: an open-vocabulary, training-free personalized image model that boasts flexible zero-shot capabilities.
We train MoMA to serve a dual role as both a feature extractor and a generator.
We introduce a novel self-attention shortcut method that efficiently transfers image features to an image diffusion model.
arXiv Detail & Related papers (2024-04-08T16:55:49Z) - Unsegment Anything by Simulating Deformation [67.10966838805132]
"Anything Unsegmentable" is a task to grant any image "the right to be unsegmented"
We aim to achieve transferable adversarial attacks against all prompt-based segmentation models.
Our approach focuses on disrupting image encoder features to achieve prompt-agnostic attacks.
arXiv Detail & Related papers (2024-04-03T09:09:42Z) - VQAttack: Transferable Adversarial Attacks on Visual Question Answering
via Pre-trained Models [58.21452697997078]
We propose a novel VQAttack model, which can generate both image and text perturbations with the designed modules.
Experimental results on two VQA datasets with five validated models demonstrate the effectiveness of the proposed VQAttack.
arXiv Detail & Related papers (2024-02-16T21:17:42Z) - Fooling Contrastive Language-Image Pre-trained Models with CLIPMasterPrints [15.643898659673036]
We show that despite their versatility, CLIP models are vulnerable to what we refer to as fooling master images.
Fooling master images are capable of maximizing the confidence score of a CLIP model for a significant number of widely varying prompts.
We demonstrate how fooling master images for CLIPMasterPrints can be mined using gradient descent, projected descent, or blackbox optimization.
arXiv Detail & Related papers (2023-07-07T18:54:11Z) - Iterative Adversarial Attack on Image-guided Story Ending Generation [37.42908817585858]
Multimodal learning involves developing models that can integrate information from various sources like images and texts.
Deep neural networks, which are the backbone of recent IgSEG models, are vulnerable to adversarial samples.
We propose an iterative adversarial attack method (Iterative-attack) that fuses image and text modality attacks.
arXiv Detail & Related papers (2023-05-16T06:19:03Z) - GAMA: Generative Adversarial Multi-Object Scene Attacks [48.33120361498787]
This paper presents the first approach of using generative models for adversarial attacks on multi-object scenes.
We call this attack approach Generative Adversarial Multi-object scene Attacks (GAMA)
arXiv Detail & Related papers (2022-09-20T06:40:54Z) - Dual Manifold Adversarial Robustness: Defense against Lp and non-Lp
Adversarial Attacks [154.31827097264264]
Adversarial training is a popular defense strategy against attack threat models with bounded Lp norms.
We propose Dual Manifold Adversarial Training (DMAT) where adversarial perturbations in both latent and image spaces are used in robustifying the model.
Our DMAT improves performance on normal images, and achieves comparable robustness to the standard adversarial training against Lp attacks.
arXiv Detail & Related papers (2020-09-05T06:00:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.