Related papers: Stable Signature is Unstable: Removing Image Watermark from Diffusion Models

Stable Signature is Unstable: Removing Image Watermark from Diffusion Models

URL: http://arxiv.org/abs/2405.07145v1
Date: Sun, 12 May 2024 03:04:48 GMT
Title: Stable Signature is Unstable: Removing Image Watermark from Diffusion Models
Authors: Yuepeng Hu, Zhengyuan Jiang, Moyang Guo, Neil Gong,
Abstract summary: We propose a new attack to remove the watermark from a diffusion model by fine-tuning it. Our results show that our attack can effectively remove the watermark from a diffusion model such that its generated images are non-watermarked.
Score: 1.656188668325832
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Watermark has been widely deployed by industry to detect AI-generated images. A recent watermarking framework called \emph{Stable Signature} (proposed by Meta) roots watermark into the parameters of a diffusion model's decoder such that its generated images are inherently watermarked. Stable Signature makes it possible to watermark images generated by \emph{open-source} diffusion models and was claimed to be robust against removal attacks. In this work, we propose a new attack to remove the watermark from a diffusion model by fine-tuning it. Our results show that our attack can effectively remove the watermark from a diffusion model such that its generated images are non-watermarked, while maintaining the visual quality of the generated images. Our results highlight that Stable Signature is not as stable as previously thought.

Related papers

Forging and Removing Latent-Noise Diffusion Watermarks Using a Single Image [24.513881574366046]
Previous watermarking schemes embed a secret key in the initial noise. We propose a black-box adversarial attack without presuming access to the diffusion model weights. We show that we can also apply a similar approach for watermark removal by learning perturbations to exit this region.
arXiv Detail & Related papers (2025-04-27T18:29:26Z)
SEAL: Semantic Aware Image Watermarking [26.606008778795193]
We propose a novel watermarking method that embeds semantic information about the generated image directly into the watermark. The key pattern can be inferred from the semantic embedding of the image using locality-sensitive hashing. Our results suggest that content-aware watermarks can mitigate risks arising from image-generative models.
arXiv Detail & Related papers (2025-03-15T15:29:05Z)
Spread them Apart: Towards Robust Watermarking of Generated Content [4.332441337407564]
We propose an approach to embed watermarks into the generated content to allow future detection of the generated content and identification of the user who generated it. We prove that watermarks embedded are guaranteed to be robust against additive perturbations of a bounded magnitude.
arXiv Detail & Related papers (2025-02-11T09:23:38Z)
RoboSignature: Robust Signature and Watermarking on Network Attacks [0.5461938536945723]
We present a novel adversarial fine-tuning attack that disrupts the model's ability to embed the intended watermark. Our findings emphasize the importance of anticipating and defending against potential vulnerabilities in generative systems.
arXiv Detail & Related papers (2024-12-22T04:36:27Z)
Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models [16.57738116313139]
We show that attackers can leverage unrelated models, even with different latent spaces and architectures, to perform powerful and realistic forgery attacks. The first imprints a targeted watermark into real images by manipulating the latent representation of an arbitrary image in an unrelated LDM. The second attack generates new images with the target watermark by inverting a watermarked image and re-generating it with an arbitrary prompt.
arXiv Detail & Related papers (2024-12-04T12:57:17Z)
An undetectable watermark for generative image models [65.31658824274894]
We present the first undetectable watermarking scheme for generative image models. In particular, an undetectable watermark does not degrade image quality under any efficiently computable metric. Our scheme works by selecting the initial latents of a diffusion model using a pseudorandom error-correcting code.
arXiv Detail & Related papers (2024-10-09T18:33:06Z)
Gaussian Shading: Provable Performance-Lossless Image Watermarking for Diffusion Models [71.13610023354967]
Copyright protection and inappropriate content generation pose challenges for the practical implementation of diffusion models. We propose a diffusion model watermarking technique that is both performance-lossless and training-free.
arXiv Detail & Related papers (2024-04-07T13:30:10Z)
Latent Watermark: Inject and Detect Watermarks in Latent Diffusion Space [7.082806239644562]
Existing methods face the dilemma of image quality and watermark robustness. Watermarks with superior image quality usually have inferior robustness against attacks such as blurring and JPEG compression. We propose Latent Watermark, which injects and detects watermarks in the latent diffusion space.
arXiv Detail & Related papers (2024-03-30T03:19:50Z)
Attack-Resilient Image Watermarking Using Stable Diffusion [24.40254115319263]
We present ZoDiac, which uses a pre-trained stable diffusion model to inject a watermark into the trainable latent space. We find that ZoDiac is robust against state-of-the-art watermark attacks, with a watermark detection rate above 98% and a false positive rate below 6.4%. Our research demonstrates that stable diffusion is a promising approach to robust watermarking, able to withstand even stable-diffusion-based attack methods.
arXiv Detail & Related papers (2024-01-08T21:42:56Z)
Towards Robust Model Watermark via Reducing Parametric Vulnerability [57.66709830576457]
backdoor-based ownership verification becomes popular recently, in which the model owner can watermark the model. We propose a mini-max formulation to find these watermark-removed models and recover their watermark behavior. Our method improves the robustness of the model watermarking against parametric changes and numerous watermark-removal attacks.
arXiv Detail & Related papers (2023-09-09T12:46:08Z)
Invisible Image Watermarks Are Provably Removable Using Generative AI [47.25747266531665]
Invisible watermarks safeguard images' copyrights by embedding hidden messages only detectable by owners. We propose a family of regeneration attacks to remove these invisible watermarks. The proposed attack method first adds random noise to an image to destroy the watermark and then reconstructs the image.
arXiv Detail & Related papers (2023-06-02T23:29:28Z)
Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust [55.91987293510401]
Watermarking the outputs of generative models is a crucial technique for tracing copyright and preventing potential harm from AI-generated content. We introduce a novel technique called Tree-Ring Watermarking that robustly fingerprints diffusion model outputs. Our watermark is semantically hidden in the image space and is far more robust than watermarking alternatives that are currently deployed.
arXiv Detail & Related papers (2023-05-31T17:00:31Z)
The Stable Signature: Rooting Watermarks in Latent Diffusion Models [29.209892051477194]
This paper introduces an active strategy combining image watermarking and Latent Diffusion Models. The goal is for all generated images to conceal an invisible watermark allowing for future detection and/or identification. A pre-trained watermark extractor recovers the hidden signature from any generated image and a statistical test then determines whether it comes from the generative model.
arXiv Detail & Related papers (2023-03-27T17:57:33Z)
Certified Neural Network Watermarks with Randomized Smoothing [64.86178395240469]
We propose a certifiable watermarking method for deep learning models. We show that our watermark is guaranteed to be unremovable unless the model parameters are changed by more than a certain l2 threshold. Our watermark is also empirically more robust compared to previous watermarking methods.
arXiv Detail & Related papers (2022-07-16T16:06:59Z)
Fine-tuning Is Not Enough: A Simple yet Effective Watermark Removal Attack for DNN Models [72.9364216776529]
We propose a novel watermark removal attack from a different perspective. We design a simple yet powerful transformation algorithm by combining imperceptible pattern embedding and spatial-level transformations. Our attack can bypass state-of-the-art watermarking solutions with very high success rates.
arXiv Detail & Related papers (2020-09-18T09:14:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.