Low Resource Reconstruction Attacks Through Benign Prompts
- URL: http://arxiv.org/abs/2507.07947v2
- Date: Mon, 14 Jul 2025 14:03:51 GMT
- Title: Low Resource Reconstruction Attacks Through Benign Prompts
- Authors: Sol Yarkoni, Roi Livni,
- Abstract summary: We devise a new attack that requires low resources, assumes little to no access to the actual training set, and identifies, seemingly, benign prompts that lead to potentially-risky image reconstruction.<n>This highlights the risk that images might even be reconstructed by an uninformed user and unintentionally.
- Score: 12.077836270816622
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The recent advances in generative models such as diffusion models have raised several risks and concerns related to privacy, copyright infringements and data stewardship. To better understand and control the risks, various researchers have created techniques, experiments and attacks that reconstruct images, or part of images, from the training set. While these techniques already establish that data from the training set can be reconstructed, they often rely on high-resources, excess to the training set as well as well-engineered and designed prompts. In this work, we devise a new attack that requires low resources, assumes little to no access to the actual training set, and identifies, seemingly, benign prompts that lead to potentially-risky image reconstruction. This highlights the risk that images might even be reconstructed by an uninformed user and unintentionally. For example, we identified that, with regard to one existing model, the prompt ``blue Unisex T-Shirt'' can generate the face of a real-life human model. Our method builds on an intuition from previous works which leverages domain knowledge and identifies a fundamental vulnerability that stems from the use of scraped data from e-commerce platforms, where templated layouts and images are tied to pattern-like prompts.
Related papers
- Safer Prompts: Reducing IP Risk in Visual Generative AI [5.545107154611679]
We evaluate the effectiveness of prompt engineering techniques in mitigating IP infringement risks in image generation.<n>Our findings show that Chain of Thought Prompting and Task Instruction Prompting significantly reduce the similarity between generated images and the training data of diffusion models.
arXiv Detail & Related papers (2025-05-06T09:10:12Z) - Towards Reliable Verification of Unauthorized Data Usage in Personalized Text-to-Image Diffusion Models [23.09033991200197]
New personalization techniques have been proposed to customize the pre-trained base models for crafting images with specific themes or styles.
Such a lightweight solution poses a new concern regarding whether the personalized models are trained from unauthorized data.
We introduce SIREN, a novel methodology to proactively trace unauthorized data usage in black-box personalized text-to-image diffusion models.
arXiv Detail & Related papers (2024-10-14T12:29:23Z) - Exploring User-level Gradient Inversion with a Diffusion Prior [17.2657358645072]
We propose a novel gradient inversion attack that applies a denoising diffusion model as a strong image prior to enhance recovery in the large batch setting.
Unlike traditional attacks, which aim to reconstruct individual samples and suffer at large batch and image sizes, our approach instead aims to recover a representative image that captures the sensitive shared semantic information corresponding to the underlying user.
arXiv Detail & Related papers (2024-09-11T14:20:47Z) - Reconstructing Training Data From Real World Models Trained with Transfer Learning [29.028185455223785]
We present a novel approach enabling data reconstruction in realistic settings for models trained on high-resolution images.
Our method adapts the reconstruction scheme of arXiv:2206.07758 to real-world scenarios.
We introduce a novel clustering-based method to identify good reconstructions from thousands of candidates.
arXiv Detail & Related papers (2024-07-22T17:59:10Z) - EnTruth: Enhancing the Traceability of Unauthorized Dataset Usage in Text-to-image Diffusion Models with Minimal and Robust Alterations [73.94175015918059]
We introduce a novel approach, EnTruth, which Enhances Traceability of unauthorized dataset usage.
By strategically incorporating the template memorization, EnTruth can trigger the specific behavior in unauthorized models as the evidence of infringement.
Our method is the first to investigate the positive application of memorization and use it for copyright protection, which turns a curse into a blessing.
arXiv Detail & Related papers (2024-06-20T02:02:44Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - An Analysis of Recent Advances in Deepfake Image Detection in an Evolving Threat Landscape [11.45988746286973]
Deepfake or synthetic images produced using deep generative models pose serious risks to online platforms.
We study 8 state-of-the-art detectors and argue that they are far from being ready for deployment.
arXiv Detail & Related papers (2024-04-24T21:21:50Z) - A Dataset and Benchmark for Copyright Infringement Unlearning from Text-to-Image Diffusion Models [52.49582606341111]
Copyright law confers creators the exclusive rights to reproduce, distribute, and monetize their creative works.
Recent progress in text-to-image generation has introduced formidable challenges to copyright enforcement.
We introduce a novel pipeline that harmonizes CLIP, ChatGPT, and diffusion models to curate a dataset.
arXiv Detail & Related papers (2024-01-04T11:14:01Z) - Understanding Reconstruction Attacks with the Neural Tangent Kernel and
Dataset Distillation [110.61853418925219]
We build a stronger version of the dataset reconstruction attack and show how it can provably recover the emphentire training set in the infinite width regime.
We show that both theoretically and empirically, reconstructed images tend to "outliers" in the dataset.
These reconstruction attacks can be used for textitdataset distillation, that is, we can retrain on reconstructed images and obtain high predictive accuracy.
arXiv Detail & Related papers (2023-02-02T21:41:59Z) - Stylized Adversarial Defense [105.88250594033053]
adversarial training creates perturbation patterns and includes them in the training set to robustify the model.
We propose to exploit additional information from the feature space to craft stronger adversaries.
Our adversarial training approach demonstrates strong robustness compared to state-of-the-art defenses.
arXiv Detail & Related papers (2020-07-29T08:38:10Z) - Adversarially-Trained Deep Nets Transfer Better: Illustration on Image
Classification [53.735029033681435]
Transfer learning is a powerful methodology for adapting pre-trained deep neural networks on image recognition tasks to new domains.
In this work, we demonstrate that adversarially-trained models transfer better than non-adversarially-trained models.
arXiv Detail & Related papers (2020-07-11T22:48:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.