Related papers: A Unified Framework for LLM Watermarks

A Unified Framework for LLM Watermarks

URL: http://arxiv.org/abs/2602.06754v1
Date: Fri, 06 Feb 2026 15:00:30 GMT
Title: A Unified Framework for LLM Watermarks
Authors: Thibaud Gloaguen, Robin Staab, Nikola Jovanović, Martin Vechev,
Abstract summary: We show that most existing and widely used watermarking schemes can in fact be derived from a principled constrained optimization problem.<n>Our framework also provides a principled approach for designing novel watermarking schemes tailored to specific requirements.
Score: 9.515480957792542
License: http://creativecommons.org/licenses/by/4.0/
Abstract: LLM watermarks allow tracing AI-generated texts by inserting a detectable signal into their generated content. Recent works have proposed a wide range of watermarking algorithms, each with distinct designs, usually built using a bottom-up approach. Crucially, there is no general and principled formulation for LLM watermarking. In this work, we show that most existing and widely used watermarking schemes can in fact be derived from a principled constrained optimization problem. Our formulation unifies existing watermarking methods and explicitly reveals the constraints that each method optimizes. In particular, it highlights an understudied quality-diversity-power trade-off. At the same time, our framework also provides a principled approach for designing novel watermarking schemes tailored to specific requirements. For instance, it allows us to directly use perplexity as a proxy for quality, and derive new schemes that are optimal with respect to this constraint. Our experimental evaluation validates our framework: watermarking schemes derived from a given constraint consistently maximize detection power with respect to that constraint.

Related papers

Detecting Post-generation Edits to Watermarked LLM Outputs via Combinatorial Watermarking [51.417096446156926]
We introduce a new task: detecting post-generation edits locally made to watermarked LLM outputs.<n>We propose a pattern-based watermarking framework, which partitions the vocabulary into disjoint subsets and embeds the watermark.<n>We evaluate our method on open-source LLMs across a variety of editing scenarios, demonstrating strong empirical performance in edit localization.
arXiv Detail & Related papers (2025-10-02T03:33:12Z)
In-Context Watermarks for Large Language Models [71.29952527565749]
In-Context Watermarking (ICW) embeds watermarks into generated text solely through prompt engineering.<n>We investigate four ICW strategies at different levels of granularity, each paired with a tailored detection method.<n>Our experiments validate the feasibility of ICW as a model-agnostic, practical watermarking approach.
arXiv Detail & Related papers (2025-05-22T17:24:51Z)
Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation [58.85645136534301]
Existing watermarking schemes for sampled text often face trade-offs between maintaining text quality and ensuring robust detection against various attacks.<n>We propose a novel watermarking scheme that improves both detectability and text quality by introducing a cumulative watermark entropy threshold.
arXiv Detail & Related papers (2025-04-16T14:16:38Z)
Towards Copyright Protection for Knowledge Bases of Retrieval-augmented Language Models via Reasoning [58.57194301645823]
Large language models (LLMs) are increasingly integrated into real-world personalized applications.<n>The valuable and often proprietary nature of the knowledge bases used in RAG introduces the risk of unauthorized usage by adversaries.<n>Existing methods that can be generalized as watermarking techniques to protect these knowledge bases typically involve poisoning or backdoor attacks.<n>We propose name for harmless' copyright protection of knowledge bases.
arXiv Detail & Related papers (2025-02-10T09:15:56Z)
Theoretically Grounded Framework for LLM Watermarking: A Distribution-Adaptive Approach [53.32564762183639]
We introduce a novel, unified theoretical framework for watermarking Large Language Models (LLMs)<n>Our approach aims to maximize detection performance while maintaining control over the worst-case false positive rate (FPR) and distortion on text quality.<n>We propose a distortion-free, distribution-adaptive watermarking algorithm (DAWA) that leverages a surrogate model for model-agnosticism and efficiency.
arXiv Detail & Related papers (2024-10-03T18:28:10Z)
Topic-Based Watermarks for Large Language Models [46.71493672772134]
We propose a lightweight, topic-guided watermarking scheme for Large Language Model (LLM) output.<n>Our method achieves comparable perplexity to industry-leading systems, including Google's SynthID-Text.
arXiv Detail & Related papers (2024-04-02T17:49:40Z)
A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules [27.382399391266564]
We introduce a framework for reasoning about the statistical efficiency of watermarks and powerful detection rules.<n>We derive optimal detection rules for watermarks under our framework.
arXiv Detail & Related papers (2024-04-01T17:03:41Z)
Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models [31.062753031312006]
Large language models generate high-quality responses with potential misinformation. Watermarking is pivotal in this context, which involves embedding hidden markers in texts. We introduce a novel multi-objective optimization (MOO) approach for watermarking. Our method simultaneously achieves detectability and semantic integrity.
arXiv Detail & Related papers (2024-02-28T05:43:22Z)
No Free Lunch in LLM Watermarking: Trade-offs in Watermarking Design Choices [20.20770405297239]
We show that common design choices in LLM watermarking schemes make the resulting systems surprisingly susceptible to attack. We propose guidelines and defenses for LLM watermarking in practice.
arXiv Detail & Related papers (2024-02-25T20:24:07Z)
WMFormer++: Nested Transformer for Visible Watermark Removal via Implict Joint Learning [68.00975867932331]
Existing watermark removal methods mainly rely on UNet with task-specific decoder branches. We introduce an implicit joint learning paradigm to holistically integrate information from both branches. The results demonstrate our approach's remarkable superiority, surpassing existing state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-08-20T07:56:34Z)
Provable Robust Watermarking for AI-Generated Text [41.5510809722375]
We propose a robust and high-quality watermark method, Unigram-Watermark. We prove that our watermark method enjoys guaranteed generation quality, correctness in watermark detection, and is robust against text editing and paraphrasing.
arXiv Detail & Related papers (2023-06-30T07:24:32Z)
Watermarking Images in Self-Supervised Latent Spaces [75.99287942537138]
We revisit watermarking techniques based on pre-trained deep networks, in the light of self-supervised approaches. We present a way to embed both marks and binary messages into their latent spaces, leveraging data augmentation at marking time.
arXiv Detail & Related papers (2021-12-17T15:52:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.