Uncovering and Mitigating Destructive Multi-Embedding Attacks in Deepfake Proactive Forensics
- URL: http://arxiv.org/abs/2508.17247v1
- Date: Sun, 24 Aug 2025 07:57:32 GMT
- Title: Uncovering and Mitigating Destructive Multi-Embedding Attacks in Deepfake Proactive Forensics
- Authors: Lixin Jia, Haiyang Sun, Zhiqing Guo, Yunfeng Diao, Dan Ma, Gaobo Yang,
- Abstract summary: proactive forensics involves embedding imperceptible watermarks to enable reliable source tracking.<n>Existing methods rely on an idealized assumption of single watermark embedding, which proves impractical in real-world scenarios.<n>We propose a general training paradigm named Adversarial Interference Simulation (AIS) to address the vulnerability.<n>Our method enables the model to maintain the ability to extract the original watermark correctly even after a second embedding.
- Score: 17.112388802067425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rapid evolution of deepfake technologies and the wide dissemination of digital media, personal privacy is facing increasingly serious security threats. Deepfake proactive forensics, which involves embedding imperceptible watermarks to enable reliable source tracking, serves as a crucial defense against these threats. Although existing methods show strong forensic ability, they rely on an idealized assumption of single watermark embedding, which proves impractical in real-world scenarios. In this paper, we formally define and demonstrate the existence of Multi-Embedding Attacks (MEA) for the first time. When a previously protected image undergoes additional rounds of watermark embedding, the original forensic watermark can be destroyed or removed, rendering the entire proactive forensic mechanism ineffective. To address this vulnerability, we propose a general training paradigm named Adversarial Interference Simulation (AIS). Rather than modifying the network architecture, AIS explicitly simulates MEA scenarios during fine-tuning and introduces a resilience-driven loss function to enforce the learning of sparse and stable watermark representations. Our method enables the model to maintain the ability to extract the original watermark correctly even after a second embedding. Extensive experiments demonstrate that our plug-and-play AIS training paradigm significantly enhances the robustness of various existing methods against MEA.
Related papers
- Attack-Resistant Watermarking for AIGC Image Forensics via Diffusion-based Semantic Deflection [22.992750993168404]
We introduce PAI, a training-free inherent watermarking framework for AIGC copyright protection.<n>We design a novel key-conditioned deflection mechanism that subtly steers the denoising trajectory according to the user key.<n>Experiments show that PAI 98.43% verification accuracy, improving over SOTA methods by 37.25% on average, and retains strong tampering localization performance even against advanced AIGC edits.
arXiv Detail & Related papers (2026-01-10T17:49:08Z) - Class-feature Watermark: A Resilient Black-box Watermark Against Model Extraction Attacks [26.186586921585604]
Class-Feature Watermarks (CFW) consistently outperforms prior methods in resilience.<n>WRK effectively reduces watermark success rates by at least 88.79% across existing watermarking benchmarks.<n>CFW concurrently optimize both MEA transferability and post-MEA stability.
arXiv Detail & Related papers (2025-11-11T08:00:50Z) - Diffusion-Based Image Editing for Breaking Robust Watermarks [4.273350357872755]
Powerful diffusion-based image generation and editing techniques pose a new threat to robust watermarking schemes.<n>We show that a diffusion-driven image regeneration'' process can erase embedded watermarks while preserving image content.<n>We introduce a novel guided diffusion attack that explicitly targets the watermark signal during generation, significantly degrading watermark detectability.
arXiv Detail & Related papers (2025-10-07T14:34:42Z) - StableGuard: Towards Unified Copyright Protection and Tamper Localization in Latent Diffusion Models [55.05404953041403]
We propose a novel framework that seamlessly integrates a binary watermark into the diffusion generation process.<n>We show that StableGuard consistently outperforms state-of-the-art methods in image fidelity, watermark verification, and tampering localization.
arXiv Detail & Related papers (2025-09-22T16:35:19Z) - Character-Level Perturbations Disrupt LLM Watermarks [64.60090923837701]
We formalize the system model for Large Language Model (LLM) watermarking.<n>We characterize two realistic threat models constrained on limited access to the watermark detector.<n>We demonstrate character-level perturbations are significantly more effective for watermark removal under the most restrictive threat model.<n> Experiments confirm the superiority of character-level perturbations and the effectiveness of the Genetic Algorithm (GA) in removing watermarks under realistic constraints.
arXiv Detail & Related papers (2025-09-11T02:50:07Z) - Towards Dataset Copyright Evasion Attack against Personalized Text-to-Image Diffusion Models [52.877452505561706]
We propose the first copyright evasion attack specifically designed to undermine dataset ownership verification (DOV)<n>Our CEAT2I comprises three stages: watermarked sample detection, trigger identification, and efficient watermark mitigation.<n>Our experiments show that our CEAT2I effectively evades DOV mechanisms while preserving model performance.
arXiv Detail & Related papers (2025-05-05T17:51:55Z) - SWA-LDM: Toward Stealthy Watermarks for Latent Diffusion Models [11.906245347904289]
We introduce SWA-LDM, a novel approach that enhances watermarking by randomizing the embedding process.<n>Our proposed watermark presence attack reveals the inherent vulnerabilities of existing latent-based watermarking methods.<n>This work represents a pivotal step towards securing LDM-generated images against unauthorized use.
arXiv Detail & Related papers (2025-02-14T16:55:45Z) - Social Media Authentication and Combating Deepfakes using Semi-fragile Invisible Image Watermarking [6.246098300155482]
We propose a semi-fragile image watermarking technique that embeds an invisible secret message into real images for media authentication.
Our proposed framework is designed to be fragile to facial manipulations or tampering while being robust to benign image-processing operations and watermark removal attacks.
arXiv Detail & Related papers (2024-10-02T18:05:03Z) - ModelShield: Adaptive and Robust Watermark against Model Extraction Attack [58.46326901858431]
Large language models (LLMs) demonstrate general intelligence across a variety of machine learning tasks.<n> adversaries can still utilize model extraction attacks to steal the model intelligence encoded in model generation.<n> Watermarking technology offers a promising solution for defending against such attacks by embedding unique identifiers into the model-generated content.
arXiv Detail & Related papers (2024-05-03T06:41:48Z) - Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion [15.086451828825398]
evasion adversaries can readily exploit the shortcuts created by models memorizing watermark samples.
By learning the model to accurately recognize them, unique watermark behaviors are promoted through knowledge injection.
arXiv Detail & Related papers (2024-04-21T03:38:20Z) - Safe and Robust Watermark Injection with a Single OoD Image [90.71804273115585]
Training a high-performance deep neural network requires large amounts of data and computational resources.
We propose a safe and robust backdoor-based watermark injection technique.
We induce random perturbation of model parameters during watermark injection to defend against common watermark removal attacks.
arXiv Detail & Related papers (2023-09-04T19:58:35Z) - Exploring Structure Consistency for Deep Model Watermarking [122.38456787761497]
The intellectual property (IP) of Deep neural networks (DNNs) can be easily stolen'' by surrogate model attack.
We propose a new watermarking methodology, namely structure consistency'', based on which a new deep structure-aligned model watermarking algorithm is designed.
arXiv Detail & Related papers (2021-08-05T04:27:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.