Are Watermarks Bugs for Deepfake Detectors? Rethinking Proactive Forensics
- URL: http://arxiv.org/abs/2404.17867v1
- Date: Sat, 27 Apr 2024 11:20:49 GMT
- Title: Are Watermarks Bugs for Deepfake Detectors? Rethinking Proactive Forensics
- Authors: Xiaoshuai Wu, Xin Liao, Bo Ou, Yuling Liu, Zheng Qin,
- Abstract summary: We argue that current watermarking models, originally devised for genuine images, may harm the deployed Deepfake detectors when directly applied to forged images.
We propose AdvMark, on behalf of proactive forensics, to exploit the adversarial vulnerability of passive detectors for good.
- Score: 14.596038695008403
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: AI-generated content has accelerated the topic of media synthesis, particularly Deepfake, which can manipulate our portraits for positive or malicious purposes. Before releasing these threatening face images, one promising forensics solution is the injection of robust watermarks to track their own provenance. However, we argue that current watermarking models, originally devised for genuine images, may harm the deployed Deepfake detectors when directly applied to forged images, since the watermarks are prone to overlap with the forgery signals used for detection. To bridge this gap, we thus propose AdvMark, on behalf of proactive forensics, to exploit the adversarial vulnerability of passive detectors for good. Specifically, AdvMark serves as a plug-and-play procedure for fine-tuning any robust watermarking into adversarial watermarking, to enhance the forensic detectability of watermarked images; meanwhile, the watermarks can still be extracted for provenance tracking. Extensive experiments demonstrate the effectiveness of the proposed AdvMark, leveraging robust watermarking to fool Deepfake detectors, which can help improve the accuracy of downstream Deepfake detection without tuning the in-the-wild detectors. We believe this work will shed some light on the harmless proactive forensics against Deepfake.
Related papers
- RAW: A Robust and Agile Plug-and-Play Watermark Framework for AI-Generated Images with Provable Guarantees [33.61946642460661]
This paper introduces a robust and agile watermark detection framework, dubbed as RAW.
We employ a classifier that is jointly trained with the watermark to detect the presence of the watermark.
We show that the framework provides provable guarantees regarding the false positive rate for misclassifying a watermarked image.
arXiv Detail & Related papers (2024-01-23T22:00:49Z) - Robustness of AI-Image Detectors: Fundamental Limits and Practical
Attacks [47.04650443491879]
We analyze the robustness of various AI-image detectors including watermarking and deepfake detectors.
We show that watermarking methods are vulnerable to spoofing attacks where the attacker aims to have real images identified as watermarked ones.
arXiv Detail & Related papers (2023-09-29T18:30:29Z) - Towards Robust Model Watermark via Reducing Parametric Vulnerability [57.66709830576457]
backdoor-based ownership verification becomes popular recently, in which the model owner can watermark the model.
We propose a mini-max formulation to find these watermark-removed models and recover their watermark behavior.
Our method improves the robustness of the model watermarking against parametric changes and numerous watermark-removal attacks.
arXiv Detail & Related papers (2023-09-09T12:46:08Z) - An Unforgeable Publicly Verifiable Watermark for Large Language Models [84.2805275589553]
Current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection.
We propose an unforgeable publicly verifiable watermark algorithm named UPV that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages.
arXiv Detail & Related papers (2023-07-30T13:43:27Z) - On the Reliability of Watermarks for Large Language Models [95.87476978352659]
We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document.
We find that watermarks remain detectable even after human and machine paraphrasing.
We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document.
arXiv Detail & Related papers (2023-06-07T17:58:48Z) - SepMark: Deep Separable Watermarking for Unified Source Tracing and
Deepfake Detection [15.54035395750232]
Malicious Deepfakes have led to a sharp conflict over distinguishing between genuine and forged faces.
We propose SepMark, which provides a unified framework for source tracing and Deepfake detection.
arXiv Detail & Related papers (2023-05-10T17:15:09Z) - Watermark Vaccine: Adversarial Attacks to Prevent Watermark Removal [69.10633149787252]
We propose a novel defence mechanism by adversarial machine learning for good.
Two types of vaccines are proposed: Disrupting Watermark Vaccine (DWV) induces to ruin the host image along with watermark after passing through watermark-removal networks.
Inerasable Watermark Vaccine (IWV) works in another fashion of trying to keep the watermark not removed and still noticeable.
arXiv Detail & Related papers (2022-07-17T13:50:02Z) - Certified Neural Network Watermarks with Randomized Smoothing [64.86178395240469]
We propose a certifiable watermarking method for deep learning models.
We show that our watermark is guaranteed to be unremovable unless the model parameters are changed by more than a certain l2 threshold.
Our watermark is also empirically more robust compared to previous watermarking methods.
arXiv Detail & Related papers (2022-07-16T16:06:59Z) - FaceSigns: Semi-Fragile Neural Watermarks for Media Authentication and
Countering Deepfakes [25.277040616599336]
Deepfakes and manipulated media are becoming a prominent threat due to the recent advances in realistic image and video synthesis techniques.
We introduce a deep learning based semi-fragile watermarking technique that allows media authentication by verifying an invisible secret message embedded in the image pixels.
arXiv Detail & Related papers (2022-04-05T03:29:30Z) - Watermark Faker: Towards Forgery of Digital Image Watermarking [10.14145437847397]
We make the first attempt to develop digital image watermark fakers by using generative adversarial learning.
Our experiments show that the proposed watermark faker can effectively crack digital image watermarkers in both spatial and frequency domains.
arXiv Detail & Related papers (2021-03-23T12:28:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.