Related papers: Lost in Overlap: Exploring Watermark Collision in LLMs

Lost in Overlap: Exploring Watermark Collision in LLMs

URL: http://arxiv.org/abs/2403.10020v2
Date: Wed, 14 Aug 2024 13:15:00 GMT
Title: Lost in Overlap: Exploring Watermark Collision in LLMs
Authors: Yiyang Luo, Ke Lin, Chao Gu,
Abstract summary: We introduce watermark collision as a novel and general philosophy for watermark attacks. We provide a comprehensive demonstration that watermark collision poses a threat to all logit-based watermark algorithms.
Score: 6.398660996031915
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The proliferation of large language models (LLMs) in generating content raises concerns about text copyright. Watermarking methods, particularly logit-based approaches, embed imperceptible identifiers into text to address these challenges. However, the widespread usage of watermarking across diverse LLMs has led to an inevitable issue known as watermark collision during common tasks, such as paraphrasing or translation. In this paper, we introduce watermark collision as a novel and general philosophy for watermark attacks, aimed at enhancing attack performance on top of any other attacking methods. We also provide a comprehensive demonstration that watermark collision poses a threat to all logit-based watermark algorithms, impacting not only specific attack scenarios but also downstream applications.

Related papers

Revealing Weaknesses in Text Watermarking Through Self-Information Rewrite Attacks [36.01146548147208]
Text watermarking algorithms embed watermarks in high-entropy tokens to ensure text quality.<n>In this paper, we reveal that this seemingly benign design can be exploited by attackers, posing a significant risk to the robustness of the watermark.<n>We introduce a generic efficient paraphrasing attack, which leverages the vulnerability by calculating the self-information of each token.
arXiv Detail & Related papers (2025-05-08T12:39:00Z)
Defending LLM Watermarking Against Spoofing Attacks with Contrastive Representation Learning [34.76886510334969]
A piggyback attack can maliciously alter the meaning of watermarked text-transforming it into hate speech-while preserving the original watermark. We propose a semantic-aware watermarking algorithm that embeds watermarks into a given target text while preserving its original meaning.
arXiv Detail & Related papers (2025-04-09T04:38:17Z)
Toward Breaking Watermarks in Distortion-free Large Language Models [11.922206306917435]
We show that it is possible to "compromise" the LLM and carry out a "spoofing" attack. Specifically, we propose a mixed integer linear programming framework that accurately estimates the secret key used for watermarking.
arXiv Detail & Related papers (2025-02-25T19:52:55Z)
Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models [16.57738116313139]
We show that attackers can leverage unrelated models, even with different latent spaces and architectures, to perform powerful and realistic forgery attacks. The first imprints a targeted watermark into real images by manipulating the latent representation of an arbitrary image in an unrelated LDM. The second attack generates new images with the target watermark by inverting a watermarked image and re-generating it with an arbitrary prompt.
arXiv Detail & Related papers (2024-12-04T12:57:17Z)
Your Fixed Watermark is Fragile: Towards Semantic-Aware Watermark for EaaS Copyright Protection [5.2431999629987]
Embedding-as-a-Service (E) has emerged as a successful business pattern but faces significant challenges related to copyright infringement. Various studies have proposed backdoor-based watermarking schemes to protect the copyright of E services. In this paper, we reveal that previous watermarking schemes possess semantic-independent characteristics.
arXiv Detail & Related papers (2024-11-14T11:06:34Z)
Revisiting the Robustness of Watermarking to Paraphrasing Attacks [10.68370011459729]
Many recent watermarking techniques modify the output probabilities of LMs to embed a signal in the generated output that can later be detected. We show that with access to only a limited number of generations from a black-box watermarked model, we can drastically increase the effectiveness of paraphrasing attacks to evade watermark detection.
arXiv Detail & Related papers (2024-11-08T02:22:30Z)
ESpeW: Robust Copyright Protection for LLM-based EaaS via Embedding-Specific Watermark [50.08021440235581]
Embeds as a Service (Eding) is emerging as a crucial role in AI applications. Eding is vulnerable to model extraction attacks, highlighting the urgent need for copyright protection. We propose a novel embedding-specific watermarking (ESpeW) mechanism to offer robust copyright protection for Eding.
arXiv Detail & Related papers (2024-10-23T04:34:49Z)
De-mark: Watermark Removal in Large Language Models [59.00698153097887]
We present De-mark, an advanced framework designed to remove n-gram-based watermarks effectively. Our method utilizes a novel querying strategy, termed random selection probing, which aids in assessing the strength of the watermark.
arXiv Detail & Related papers (2024-10-17T17:42:10Z)
Can Watermarked LLMs be Identified by Users via Crafted Prompts? [55.460327393792156]
This work is the first to investigate the imperceptibility of watermarked Large Language Models (LLMs) We design an identification algorithm called Water-Probe that detects watermarks through well-designed prompts. Experiments show that almost all mainstream watermarking algorithms are easily identified with our well-designed prompts.
arXiv Detail & Related papers (2024-10-04T06:01:27Z)
Large Language Model Watermark Stealing With Mixed Integer Programming [51.336009662771396]
Large Language Model (LLM) watermark shows promise in addressing copyright, monitoring AI-generated text, and preventing its misuse. Recent research indicates that watermarking methods using numerous keys are susceptible to removal attacks. We propose a novel green list stealing attack against the state-of-the-art LLM watermark scheme.
arXiv Detail & Related papers (2024-05-30T04:11:17Z)
Watermark Stealing in Large Language Models [2.1165011830664673]
We show that querying the API of the watermarked LLM to approximately reverse-engineer a watermark enables practical spoofing attacks. We are the first to propose an automated WS algorithm and use it in the first comprehensive study of spoofing and scrubbing in realistic settings.
arXiv Detail & Related papers (2024-02-29T17:12:39Z)
Certified Neural Network Watermarks with Randomized Smoothing [64.86178395240469]
We propose a certifiable watermarking method for deep learning models. We show that our watermark is guaranteed to be unremovable unless the model parameters are changed by more than a certain l2 threshold. Our watermark is also empirically more robust compared to previous watermarking methods.
arXiv Detail & Related papers (2022-07-16T16:06:59Z)
Fine-tuning Is Not Enough: A Simple yet Effective Watermark Removal Attack for DNN Models [72.9364216776529]
We propose a novel watermark removal attack from a different perspective. We design a simple yet powerful transformation algorithm by combining imperceptible pattern embedding and spatial-level transformations. Our attack can bypass state-of-the-art watermarking solutions with very high success rates.
arXiv Detail & Related papers (2020-09-18T09:14:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.