Robustness of AI-Image Detectors: Fundamental Limits and Practical
Attacks
- URL: http://arxiv.org/abs/2310.00076v2
- Date: Wed, 14 Feb 2024 07:39:54 GMT
- Title: Robustness of AI-Image Detectors: Fundamental Limits and Practical
Attacks
- Authors: Mehrdad Saberi, Vinu Sankar Sadasivan, Keivan Rezaei, Aounon Kumar,
Atoosa Chegini, Wenxiao Wang, Soheil Feizi
- Abstract summary: We analyze the robustness of various AI-image detectors including watermarking and deepfake detectors.
We show that watermarking methods are vulnerable to spoofing attacks where the attacker aims to have real images identified as watermarked ones.
- Score: 47.04650443491879
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In light of recent advancements in generative AI models, it has become
essential to distinguish genuine content from AI-generated one to prevent the
malicious usage of fake materials as authentic ones and vice versa. Various
techniques have been introduced for identifying AI-generated images, with
watermarking emerging as a promising approach. In this paper, we analyze the
robustness of various AI-image detectors including watermarking and
classifier-based deepfake detectors. For watermarking methods that introduce
subtle image perturbations (i.e., low perturbation budget methods), we reveal a
fundamental trade-off between the evasion error rate (i.e., the fraction of
watermarked images detected as non-watermarked ones) and the spoofing error
rate (i.e., the fraction of non-watermarked images detected as watermarked
ones) upon an application of diffusion purification attack. To validate our
theoretical findings, we also provide empirical evidence demonstrating that
diffusion purification effectively removes low perturbation budget watermarks
by applying minimal changes to images. The diffusion purification attack is
ineffective for high perturbation watermarking methods where notable changes
are applied to images. In this case, we develop a model substitution
adversarial attack that can successfully remove watermarks. Moreover, we show
that watermarking methods are vulnerable to spoofing attacks where the attacker
aims to have real images identified as watermarked ones, damaging the
reputation of the developers. In particular, with black-box access to the
watermarking method, a watermarked noise image can be generated and added to
real images, causing them to be incorrectly classified as watermarked. Finally,
we extend our theory to characterize a fundamental trade-off between the
robustness and reliability of classifier-based deep fake detectors and
demonstrate it through experiments.
Related papers
- Social Media Authentication and Combating Deepfakes using Semi-fragile Invisible Image Watermarking [6.246098300155482]
We propose a semi-fragile image watermarking technique that embeds an invisible secret message into real images for media authentication.
Our proposed framework is designed to be fragile to facial manipulations or tampering while being robust to benign image-processing operations and watermark removal attacks.
arXiv Detail & Related papers (2024-10-02T18:05:03Z) - Robustness of Watermarking on Text-to-Image Diffusion Models [9.277492743469235]
We investigate the robustness of generative watermarking, which is created from the integration of watermarking embedding and text-to-image generation processing.
We found that generative watermarking methods are robust to direct evasion attacks, like discriminator-based attacks, or manipulation based on the edge information in edge prediction-based attacks but vulnerable to malicious fine-tuning.
arXiv Detail & Related papers (2024-08-04T13:59:09Z) - Certifiably Robust Image Watermark [57.546016845801134]
Generative AI raises many societal concerns such as boosting disinformation and propaganda campaigns.
Watermarking AI-generated content is a key technology to address these concerns.
We propose the first image watermarks with certified robustness guarantees against removal and forgery attacks.
arXiv Detail & Related papers (2024-07-04T17:56:04Z) - A Transfer Attack to Image Watermarks [1.656188668325832]
We propose a new transfer evasion attack to image watermark in the no-box setting.
Our major contribution is to show that, both theoretically and empirically, watermark-based AI-generated image detector is not robust to evasion attacks.
arXiv Detail & Related papers (2024-03-22T17:33:11Z) - RAW: A Robust and Agile Plug-and-Play Watermark Framework for AI-Generated Images with Provable Guarantees [33.61946642460661]
This paper introduces a robust and agile watermark detection framework, dubbed as RAW.
We employ a classifier that is jointly trained with the watermark to detect the presence of the watermark.
We show that the framework provides provable guarantees regarding the false positive rate for misclassifying a watermarked image.
arXiv Detail & Related papers (2024-01-23T22:00:49Z) - WAVES: Benchmarking the Robustness of Image Watermarks [67.955140223443]
WAVES (Watermark Analysis Via Enhanced Stress-testing) is a benchmark for assessing image watermark robustness.
We integrate detection and identification tasks and establish a standardized evaluation protocol comprised of a diverse range of stress tests.
We envision WAVES as a toolkit for the future development of robust watermarks.
arXiv Detail & Related papers (2024-01-16T18:58:36Z) - Invisible Image Watermarks Are Provably Removable Using Generative AI [47.25747266531665]
Invisible watermarks safeguard images' copyrights by embedding hidden messages only detectable by owners.
We propose a family of regeneration attacks to remove these invisible watermarks.
The proposed attack method first adds random noise to an image to destroy the watermark and then reconstructs the image.
arXiv Detail & Related papers (2023-06-02T23:29:28Z) - Certified Neural Network Watermarks with Randomized Smoothing [64.86178395240469]
We propose a certifiable watermarking method for deep learning models.
We show that our watermark is guaranteed to be unremovable unless the model parameters are changed by more than a certain l2 threshold.
Our watermark is also empirically more robust compared to previous watermarking methods.
arXiv Detail & Related papers (2022-07-16T16:06:59Z) - Fine-tuning Is Not Enough: A Simple yet Effective Watermark Removal
Attack for DNN Models [72.9364216776529]
We propose a novel watermark removal attack from a different perspective.
We design a simple yet powerful transformation algorithm by combining imperceptible pattern embedding and spatial-level transformations.
Our attack can bypass state-of-the-art watermarking solutions with very high success rates.
arXiv Detail & Related papers (2020-09-18T09:14:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.