Semantic Mismatch and Perceptual Degradation: A New Perspective on Image Editing Immunity
- URL: http://arxiv.org/abs/2512.14320v1
- Date: Tue, 16 Dec 2025 11:34:48 GMT
- Title: Semantic Mismatch and Perceptual Degradation: A New Perspective on Image Editing Immunity
- Authors: Shuai Dong, Jie Zhang, Guoying Zhao, Shiguang Shan, Xilin Chen,
- Abstract summary: We argue that immunization success should be defined by the edited output either semantically mismatching the prompt or suffering substantial perceptual degradations.<n>We introduce the Immunization Success Rate (ISR), a novel metric designed to rigorously quantify true immunization efficacy for the first time.
- Score: 79.10998560865444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-guided image editing via diffusion models, while powerful, raises significant concerns about misuse, motivating efforts to immunize images against unauthorized edits using imperceptible perturbations. Prevailing metrics for evaluating immunization success typically rely on measuring the visual dissimilarity between the output generated from a protected image and a reference output generated from the unprotected original. This approach fundamentally overlooks the core requirement of image immunization, which is to disrupt semantic alignment with attacker intent, regardless of deviation from any specific output. We argue that immunization success should instead be defined by the edited output either semantically mismatching the prompt or suffering substantial perceptual degradations, both of which thwart malicious intent. To operationalize this principle, we propose Synergistic Intermediate Feature Manipulation (SIFM), a method that strategically perturbs intermediate diffusion features through dual synergistic objectives: (1) maximizing feature divergence from the original edit trajectory to disrupt semantic alignment with the expected edit, and (2) minimizing feature norms to induce perceptual degradations. Furthermore, we introduce the Immunization Success Rate (ISR), a novel metric designed to rigorously quantify true immunization efficacy for the first time. ISR quantifies the proportion of edits where immunization induces either semantic failure relative to the prompt or significant perceptual degradations, assessed via Multimodal Large Language Models (MLLMs). Extensive experiments show our SIFM achieves the state-of-the-art performance for safeguarding visual content against malicious diffusion-based manipulation.
Related papers
- Universal Image Immunization against Diffusion-based Image Editing via Semantic Injection [29.203173410857914]
We propose the first universal image immunization framework that generates a single, broadly applicable adversarial perturbation.<n>Inspired by universal adversarial perturbation (UAP) techniques used in targeted attacks, our method generates a UAP that embeds a semantic target into images to be protected.<n>Our approach effectively blocks malicious editing attempts by overwriting the original semantic content in the image via the UAP.
arXiv Detail & Related papers (2026-02-16T12:08:37Z) - The Illusion of Forgetting: Attack Unlearned Diffusion via Initial Latent Variable Optimization [51.835894707552946]
Unlearning-based defenses claim to purge Not-Safe-For-Work concepts from diffusion models (DMs)<n>We show that unlearning partially disrupts the mapping between linguistic symbols and the underlying knowledge, which remains intact as dormant memories.<n>We propose IVO, a concise and powerful attack framework that reactivates these dormant memories by reconstructing the broken mappings.
arXiv Detail & Related papers (2026-01-30T02:39:51Z) - Towards Transferable Defense Against Malicious Image Edits [70.17363183107604]
Transferable Defense Against Malicious Image Edits (TDAE) is a novel bimodal framework that enhances image immunity against malicious edits.<n>We introduce FlatGrad Defense Mechanism (FDM), which incorporates gradient regularization into the adversarial objective.<n>For textual enhancement protection, we propose Dynamic Prompt Defense (DPD), which periodically refines text embeddings to align the editing outcomes of immunized images with those of the original images.
arXiv Detail & Related papers (2025-12-16T12:10:16Z) - Dual Attention Guided Defense Against Malicious Edits [70.17363183107604]
We propose a Dual Attention-Guided Noise Perturbation (DANP) immunization method that adds imperceptible perturbations to disrupt the model's semantic understanding and generation process.<n>Our method exhibits impressive immunity against malicious edits, and extensive experiments confirm that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-12-16T12:01:28Z) - Active Adversarial Noise Suppression for Image Forgery Localization [56.98050814363447]
We introduce an Adversarial Noise Suppression Module (ANSM) that generate a defensive perturbation to suppress the attack effect of adversarial noise.<n>To our best knowledge, this is the first report of adversarial defense in image forgery localization tasks.
arXiv Detail & Related papers (2025-06-15T14:53:27Z) - CopyrightShield: Enhancing Diffusion Model Security against Copyright Infringement Attacks [61.06621533874629]
Diffusion models are vulnerable to copyright infringement attacks, where attackers inject strategically modified non-infringing images into the training set.<n>We first propose a defense framework, CopyrightShield, to defend against the above attack.<n> Experimental results demonstrate that CopyrightShield significantly improves poisoned sample detection performance across two attack scenarios.
arXiv Detail & Related papers (2024-12-02T14:19:44Z) - Optimization-Free Image Immunization Against Diffusion-Based Editing [23.787546784989484]
DiffVax is a scalable, lightweight, and optimization-free framework for image immunization.<n>Our approach enables effective generalization to unseen content, reducing computational costs and cutting immunization time from days to milliseconds.
arXiv Detail & Related papers (2024-11-27T00:30:26Z) - Boosting Imperceptibility of Stable Diffusion-based Adversarial Examples Generation with Momentum [13.305800254250789]
We propose a novel framework, Stable Diffusion-based Momentum Integrated Adversarial Examples (SD-MIAE)
It generates adversarial examples that can effectively mislead neural network classifiers while maintaining visual imperceptibility and preserving the semantic similarity to the original class label.
Experimental results demonstrate that SD-MIAE achieves a high misclassification rate of 79%, improving by 35% over the state-of-the-art method.
arXiv Detail & Related papers (2024-10-17T01:22:11Z) - Raising the Cost of Malicious AI-Powered Image Editing [82.71990330465115]
We present an approach to mitigating the risks of malicious image editing posed by large diffusion models.
The key idea is to immunize images so as to make them resistant to manipulation by these models.
arXiv Detail & Related papers (2023-02-13T18:38:42Z) - Unsupervised Medical Image Translation with Adversarial Diffusion Models [0.2770822269241974]
Imputation of missing images via source-to-target modality translation can improve diversity in medical imaging protocols.
Here, we propose a novel method based on adversarial diffusion modeling, SynDiff, for improved performance in medical image translation.
arXiv Detail & Related papers (2022-07-17T15:53:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.