Related papers: Are Watermarks Bugs for Deepfake Detectors? Rethinking Proactive Forensics

Are Watermarks Bugs for Deepfake Detectors? Rethinking Proactive Forensics

URL: http://arxiv.org/abs/2404.17867v1
Date: Sat, 27 Apr 2024 11:20:49 GMT
Title: Are Watermarks Bugs for Deepfake Detectors? Rethinking Proactive Forensics
Authors: Xiaoshuai Wu, Xin Liao, Bo Ou, Yuling Liu, Zheng Qin,
Abstract summary: We argue that current watermarking models, originally devised for genuine images, may harm the deployed Deepfake detectors when directly applied to forged images. We propose AdvMark, on behalf of proactive forensics, to exploit the adversarial vulnerability of passive detectors for good.
Score: 14.596038695008403
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: AI-generated content has accelerated the topic of media synthesis, particularly Deepfake, which can manipulate our portraits for positive or malicious purposes. Before releasing these threatening face images, one promising forensics solution is the injection of robust watermarks to track their own provenance. However, we argue that current watermarking models, originally devised for genuine images, may harm the deployed Deepfake detectors when directly applied to forged images, since the watermarks are prone to overlap with the forgery signals used for detection. To bridge this gap, we thus propose AdvMark, on behalf of proactive forensics, to exploit the adversarial vulnerability of passive detectors for good. Specifically, AdvMark serves as a plug-and-play procedure for fine-tuning any robust watermarking into adversarial watermarking, to enhance the forensic detectability of watermarked images; meanwhile, the watermarks can still be extracted for provenance tracking. Extensive experiments demonstrate the effectiveness of the proposed AdvMark, leveraging robust watermarking to fool Deepfake detectors, which can help improve the accuracy of downstream Deepfake detection without tuning the in-the-wild detectors. We believe this work will shed some light on the harmless proactive forensics against Deepfake.

Related papers

Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation? [75.99961894619986]
This paper investigates whether student models can acquire the capabilities of teacher models through knowledge distillation while avoiding watermark inheritance. We propose two categories of watermark removal approaches: pre-distillation removal through untargeted and targeted training data paraphrasing (UP and TP), and post-distillation removal through inference-time watermark neutralization (WN)
arXiv Detail & Related papers (2025-02-17T09:34:19Z)
LampMark: Proactive Deepfake Detection via Training-Free Landmark Perceptual Watermarks [7.965986856780787]
This paper introduces a novel training-free landmark perceptual watermark, LampMark for short. We first analyze the structure-sensitive characteristics of Deepfake manipulations and devise a secure and confidential transformation pipeline. We present an end-to-end watermarking framework that imperceptibly embeds and extracts watermarks concerning the images to be protected.
arXiv Detail & Related papers (2024-11-26T08:24:56Z)
An undetectable watermark for generative image models [65.31658824274894]
We present the first undetectable watermarking scheme for generative image models. In particular, an undetectable watermark does not degrade image quality under any efficiently computable metric. Our scheme works by selecting the initial latents of a diffusion model using a pseudorandom error-correcting code.
arXiv Detail & Related papers (2024-10-09T18:33:06Z)
Social Media Authentication and Combating Deepfakes using Semi-fragile Invisible Image Watermarking [6.246098300155482]
We propose a semi-fragile image watermarking technique that embeds an invisible secret message into real images for media authentication. Our proposed framework is designed to be fragile to facial manipulations or tampering while being robust to benign image-processing operations and watermark removal attacks.
arXiv Detail & Related papers (2024-10-02T18:05:03Z)
UnMarker: A Universal Attack on Defensive Image Watermarking [4.013156524547072]
We present UnMarker -- the first practical universal attack on defensive watermarking. UnMarker requires no detector feedback, no unrealistic knowledge of the watermarking scheme or similar models, and no advanced denoising pipelines. Evaluations against SOTA schemes prove UnMarker's effectiveness.
arXiv Detail & Related papers (2024-05-14T07:05:18Z)
Robustness of AI-Image Detectors: Fundamental Limits and Practical Attacks [47.04650443491879]
We analyze the robustness of various AI-image detectors including watermarking and deepfake detectors. We show that watermarking methods are vulnerable to spoofing attacks where the attacker aims to have real images identified as watermarked ones.
arXiv Detail & Related papers (2023-09-29T18:30:29Z)
Towards Robust Model Watermark via Reducing Parametric Vulnerability [57.66709830576457]
backdoor-based ownership verification becomes popular recently, in which the model owner can watermark the model. We propose a mini-max formulation to find these watermark-removed models and recover their watermark behavior. Our method improves the robustness of the model watermarking against parametric changes and numerous watermark-removal attacks.
arXiv Detail & Related papers (2023-09-09T12:46:08Z)
On the Reliability of Watermarks for Large Language Models [95.87476978352659]
We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document. We find that watermarks remain detectable even after human and machine paraphrasing. We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document.
arXiv Detail & Related papers (2023-06-07T17:58:48Z)
Invisible Image Watermarks Are Provably Removable Using Generative AI [47.25747266531665]
Invisible watermarks safeguard images' copyrights by embedding hidden messages only detectable by owners. We propose a family of regeneration attacks to remove these invisible watermarks. The proposed attack method first adds random noise to an image to destroy the watermark and then reconstructs the image.
arXiv Detail & Related papers (2023-06-02T23:29:28Z)
SepMark: Deep Separable Watermarking for Unified Source Tracing and Deepfake Detection [15.54035395750232]
Malicious Deepfakes have led to a sharp conflict over distinguishing between genuine and forged faces. We propose SepMark, which provides a unified framework for source tracing and Deepfake detection.
arXiv Detail & Related papers (2023-05-10T17:15:09Z)
Watermark Vaccine: Adversarial Attacks to Prevent Watermark Removal [69.10633149787252]
We propose a novel defence mechanism by adversarial machine learning for good. Two types of vaccines are proposed: Disrupting Watermark Vaccine (DWV) induces to ruin the host image along with watermark after passing through watermark-removal networks. Inerasable Watermark Vaccine (IWV) works in another fashion of trying to keep the watermark not removed and still noticeable.
arXiv Detail & Related papers (2022-07-17T13:50:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.