Related papers: Towards Transferable Defense Against Malicious Image Edits

Towards Transferable Defense Against Malicious Image Edits

URL: http://arxiv.org/abs/2512.14341v1
Date: Tue, 16 Dec 2025 12:10:16 GMT
Title: Towards Transferable Defense Against Malicious Image Edits
Authors: Jie Zhang, Shuai Dong, Shiguang Shan, Xilin Chen,
Abstract summary: Transferable Defense Against Malicious Image Edits (TDAE) is a novel bimodal framework that enhances image immunity against malicious edits.<n>We introduce FlatGrad Defense Mechanism (FDM), which incorporates gradient regularization into the adversarial objective.<n>For textual enhancement protection, we propose Dynamic Prompt Defense (DPD), which periodically refines text embeddings to align the editing outcomes of immunized images with those of the original images.
Score: 70.17363183107604
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent approaches employing imperceptible perturbations in input images have demonstrated promising potential to counter malicious manipulations in diffusion-based image editing systems. However, existing methods suffer from limited transferability in cross-model evaluations. To address this, we propose Transferable Defense Against Malicious Image Edits (TDAE), a novel bimodal framework that enhances image immunity against malicious edits through coordinated image-text optimization. Specifically, at the visual defense level, we introduce FlatGrad Defense Mechanism (FDM), which incorporates gradient regularization into the adversarial objective. By explicitly steering the perturbations toward flat minima, FDM amplifies immune robustness against unseen editing models. For textual enhancement protection, we propose an adversarial optimization paradigm named Dynamic Prompt Defense (DPD), which periodically refines text embeddings to align the editing outcomes of immunized images with those of the original images, then updates the images under optimized embeddings. Through iterative adversarial updates to diverse embeddings, DPD enforces the generation of immunized images that seek a broader set of immunity-enhancing features, thereby achieving cross-model transferability. Extensive experimental results demonstrate that our TDAE achieves state-of-the-art performance in mitigating malicious edits under both intra- and cross-model evaluations.

Related papers

Universal Image Immunization against Diffusion-based Image Editing via Semantic Injection [29.203173410857914]
We propose the first universal image immunization framework that generates a single, broadly applicable adversarial perturbation.<n>Inspired by universal adversarial perturbation (UAP) techniques used in targeted attacks, our method generates a UAP that embeds a semantic target into images to be protected.<n>Our approach effectively blocks malicious editing attempts by overwriting the original semantic content in the image via the UAP.
arXiv Detail & Related papers (2026-02-16T12:08:37Z)
Semantic Mismatch and Perceptual Degradation: A New Perspective on Image Editing Immunity [79.10998560865444]
We argue that immunization success should be defined by the edited output either semantically mismatching the prompt or suffering substantial perceptual degradations.<n>We introduce the Immunization Success Rate (ISR), a novel metric designed to rigorously quantify true immunization efficacy for the first time.
arXiv Detail & Related papers (2025-12-16T11:34:48Z)
DCT-Shield: A Robust Frequency Domain Defense against Malicious Image Editing [1.7624347338410742]
Recent defenses attempt to protect images by adding a limited noise in the pixel space to disrupt the functioning of diffusion-based editing models.<n>We propose a novel optimization approach that introduces adversarial perturbations directly in the frequency domain.<n>By leveraging the JPEG pipeline, our method generates adversarial images that effectively prevent malicious image editing.
arXiv Detail & Related papers (2025-04-24T19:14:50Z)
Optimization-Free Image Immunization Against Diffusion-Based Editing [23.787546784989484]
DiffVax is a scalable, lightweight, and optimization-free framework for image immunization.<n>Our approach enables effective generalization to unseen content, reducing computational costs and cutting immunization time from days to milliseconds.
arXiv Detail & Related papers (2024-11-27T00:30:26Z)
DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing [103.40147707280585]
DiffusionGuard is a robust and effective defense method against unauthorized edits by diffusion-based image editing models.<n>We introduce a novel objective that generates adversarial noise targeting the early stage of the diffusion process.<n>We also introduce a mask-augmentation technique to enhance robustness against various masks during test time.
arXiv Detail & Related papers (2024-10-08T05:19:19Z)
Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing [60.730661748555214]
We introduce textbfTask-textbfOriented textbfDiffusion textbfInversion (textbfTODInv), a novel framework that inverts and edits real images tailored to specific editing tasks. ToDInv seamlessly integrates inversion and editing through reciprocal optimization, ensuring both high fidelity and precise editability.
arXiv Detail & Related papers (2024-08-23T22:16:34Z)
Pixel Is Not a Barrier: An Effective Evasion Attack for Pixel-Domain Diffusion Models [9.905296922309157]
Diffusion Models have emerged as powerful generative models for high-quality image synthesis, with many subsequent image editing techniques based on them.<n>Previous works have attempted to safeguard images from diffusion-based editing by adding imperceptible perturbations.<n>Our work proposes a novel attack framework, AtkPDM, which exploits vulnerabilities in denoising UNets and a latent optimization strategy to enhance the naturalness of adversarial images.
arXiv Detail & Related papers (2024-08-21T17:56:34Z)
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models. Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs. Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z)
Adversarial Prompt Tuning for Vision-Language Models [86.5543597406173]
Adversarial Prompt Tuning (AdvPT) is a technique to enhance the adversarial robustness of image encoders in Vision-Language Models (VLMs) We demonstrate that AdvPT improves resistance against white-box and black-box adversarial attacks and exhibits a synergistic effect when combined with existing image-processing-based defense techniques.
arXiv Detail & Related papers (2023-11-19T07:47:43Z)
Raising the Cost of Malicious AI-Powered Image Editing [82.71990330465115]
We present an approach to mitigating the risks of malicious image editing posed by large diffusion models. The key idea is to immunize images so as to make them resistant to manipulation by these models.
arXiv Detail & Related papers (2023-02-13T18:38:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.