Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models
- URL: http://arxiv.org/abs/2503.09669v1
- Date: Wed, 12 Mar 2025 17:21:57 GMT
- Title: Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models
- Authors: Sangwon Jang, June Suk Choi, Jaehyeong Jo, Kimin Lee, Sung Ju Hwang,
- Abstract summary: We introduce the Silent Branding Attack, a novel data poisoning method that manipulates text-to-image diffusion models.<n>We find that when certain visual patterns are repeatedly in the training data, the model learns to reproduce them naturally in its outputs.<n>We develop an automated data poisoning algorithm that unobtrusively injects logos into original images, ensuring they blend naturally and remain undetected.
- Score: 61.56740897898055
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-image diffusion models have achieved remarkable success in generating high-quality contents from text prompts. However, their reliance on publicly available data and the growing trend of data sharing for fine-tuning make these models particularly vulnerable to data poisoning attacks. In this work, we introduce the Silent Branding Attack, a novel data poisoning method that manipulates text-to-image diffusion models to generate images containing specific brand logos or symbols without any text triggers. We find that when certain visual patterns are repeatedly in the training data, the model learns to reproduce them naturally in its outputs, even without prompt mentions. Leveraging this, we develop an automated data poisoning algorithm that unobtrusively injects logos into original images, ensuring they blend naturally and remain undetected. Models trained on this poisoned dataset generate images containing logos without degrading image quality or text alignment. We experimentally validate our silent branding attack across two realistic settings on large-scale high-quality image datasets and style personalization datasets, achieving high success rates even without a specific text trigger. Human evaluation and quantitative metrics including logo detection show that our method can stealthily embed logos.
Related papers
- Harnessing Frequency Spectrum Insights for Image Copyright Protection Against Diffusion Models [26.821064889438777]
We present novel evidence that diffusion-generated images faithfully preserve the statistical properties of their training data.
We introduce emphCoprGuard, a robust frequency domain watermarking framework to safeguard against unauthorized image usage.
arXiv Detail & Related papers (2025-03-14T04:27:50Z) - Exploiting Watermark-Based Defense Mechanisms in Text-to-Image Diffusion Models for Unauthorized Data Usage [14.985938758090763]
Text-to-image diffusion models, such as Stable Diffusion, have shown exceptional potential in generating high-quality images.<n>Recent studies highlight concerns over the use of unauthorized data in training these models, which may lead to intellectual property infringement or privacy violations.<n>We propose RATTAN, that leverages the diffusion process to conduct controlled image generation on the protected input.
arXiv Detail & Related papers (2024-11-22T22:28:19Z) - EnTruth: Enhancing the Traceability of Unauthorized Dataset Usage in Text-to-image Diffusion Models with Minimal and Robust Alterations [73.94175015918059]
We introduce a novel approach, EnTruth, which Enhances Traceability of unauthorized dataset usage.
By strategically incorporating the template memorization, EnTruth can trigger the specific behavior in unauthorized models as the evidence of infringement.
Our method is the first to investigate the positive application of memorization and use it for copyright protection, which turns a curse into a blessing.
arXiv Detail & Related papers (2024-06-20T02:02:44Z) - The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright Breaches Without Adjusting Finetuning Pipeline [30.80691226540351]
We formalized the Copyright Infringement Attack on generative AI models and proposed a backdoor attack method, SilentBadDiffusion.
Our method strategically embeds connections between pieces of copyrighted information and text references in poisoning data.
Our experiments show the stealth and efficacy of the poisoning data.
arXiv Detail & Related papers (2024-01-07T08:37:29Z) - IMPRESS: Evaluating the Resilience of Imperceptible Perturbations
Against Unauthorized Data Usage in Diffusion-Based Generative AI [52.90082445349903]
Diffusion-based image generation models can create artistic images that mimic the style of an artist or maliciously edit the original images for fake content.
Several attempts have been made to protect the original images from such unauthorized data usage by adding imperceptible perturbations.
In this work, we introduce a purification perturbation platform, named IMPRESS, to evaluate the effectiveness of imperceptible perturbations as a protective measure.
arXiv Detail & Related papers (2023-10-30T03:33:41Z) - DIAGNOSIS: Detecting Unauthorized Data Usages in Text-to-image Diffusion Models [79.71665540122498]
We propose a method for detecting unauthorized data usage by planting the injected content into the protected dataset.
Specifically, we modify the protected images by adding unique contents on these images using stealthy image warping functions.
By analyzing whether the model has memorized the injected content, we can detect models that had illegally utilized the unauthorized data.
arXiv Detail & Related papers (2023-07-06T16:27:39Z) - Improving Handwritten OCR with Training Samples Generated by Glyph
Conditional Denoising Diffusion Probabilistic Model [10.239782333441031]
We propose a denoising diffusion probabilistic model (DDPM) to generate training samples.
This model creates mappings between printed characters and handwritten images.
Synthetic images are not always consistent with the glyph conditional images.
We propose a progressive data filtering strategy to add those samples with a high confidence of correctness to the training set.
arXiv Detail & Related papers (2023-05-31T04:18:30Z) - Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images [60.34381768479834]
Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language.
We pioneer a systematic study on deepfake detection generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-04-02T10:25:09Z) - Discriminative Class Tokens for Text-to-Image Diffusion Models [102.88033622546251]
We propose a non-invasive fine-tuning technique that capitalizes on the expressive potential of free-form text.<n>Our method is fast compared to prior fine-tuning methods and does not require a collection of in-class images.<n>We evaluate our method extensively, showing that the generated images are: (i) more accurate and of higher quality than standard diffusion models, (ii) can be used to augment training data in a low-resource setting, and (iii) reveal information about the data used to train the guiding classifier.
arXiv Detail & Related papers (2023-03-30T05:25:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.