Learning to Watermark: A Selective Watermarking Framework for Large Language Models via Multi-Objective Optimization
- URL: http://arxiv.org/abs/2510.15976v1
- Date: Mon, 13 Oct 2025 01:07:38 GMT
- Title: Learning to Watermark: A Selective Watermarking Framework for Large Language Models via Multi-Objective Optimization
- Authors: Chenrui Wang, Junyi Shu, Billy Chiu, Yu Li, Saleh Alharbi, Min Zhang, Jing Li,
- Abstract summary: Existing watermarking techniques often face trade-off between watermark detectability and generated text quality.<n>In this paper, we introduce Learning to Watermark (LTW), a novel selective watermarking framework.
- Score: 17.15048594237333
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid development of LLMs has raised concerns about their potential misuse, leading to various watermarking schemes that typically offer high detectability. However, existing watermarking techniques often face trade-off between watermark detectability and generated text quality. In this paper, we introduce Learning to Watermark (LTW), a novel selective watermarking framework that leverages multi-objective optimization to effectively balance these competing goals. LTW features a lightweight network that adaptively decides when to apply the watermark by analyzing sentence embeddings, token entropy, and current watermarking ratio. Training of the network involves two specifically constructed loss functions that guide the model toward Pareto-optimal solutions, thereby harmonizing watermark detectability and text quality. By integrating LTW with two baseline watermarking methods, our experimental evaluations demonstrate that LTW significantly enhances text quality without compromising detectability. Our selective watermarking approach offers a new perspective for designing watermarks for LLMs and a way to preserve high text quality for watermarks. The code is publicly available at: https://github.com/fattyray/learning-to-watermark
Related papers
- Learning Generalizable and Efficient Image Watermarking via Hierarchical Two-Stage Optimization [90.13049455759358]
We propose a two-stage optimization that enable a watermarking model to simultaneously achieve three criteria.<n>HiWL effectively learns generalizable latent-space watermark representations while maintaining broad applicability.<n>It achieves 7.6% higher accuracy in watermark extraction than existing methods, while maintaining extremely low latency (100K images processed in 8s)
arXiv Detail & Related papers (2025-08-12T06:21:27Z) - BiMarker: Enhancing Text Watermark Detection for Large Language Models with Bipolar Watermarks [13.741307434082033]
Existing watermarking techniques struggle with low watermark strength and stringent false-positive requirements.<n>tool splits generated text into positive and negative poles, enhancing detection without requiring additional computational resources.
arXiv Detail & Related papers (2025-01-21T14:32:50Z) - De-mark: Watermark Removal in Large Language Models [59.00698153097887]
We present De-mark, an advanced framework designed to remove n-gram-based watermarks effectively.<n>Our method utilizes a novel querying strategy, termed random selection probing, which aids in assessing the strength of the watermark.
arXiv Detail & Related papers (2024-10-17T17:42:10Z) - Efficiently Identifying Watermarked Segments in Mixed-Source Texts [35.437251393372954]
We propose two novel methods for partial watermark detection.<n>First, we develop a geometry cover detection framework aimed at determining whether there is a watermark segment in long text.<n>Second, we introduce an adaptive online learning algorithm to pinpoint the precise location of watermark segments within the text.
arXiv Detail & Related papers (2024-10-04T16:58:41Z) - Can Watermarked LLMs be Identified by Users via Crafted Prompts? [55.460327393792156]
This work is the first to investigate the imperceptibility of watermarked Large Language Models (LLMs)<n>We design an identification algorithm called Water-Probe that detects watermarks through well-designed prompts.<n> Experiments show that almost all mainstream watermarking algorithms are easily identified with our well-designed prompts.
arXiv Detail & Related papers (2024-10-04T06:01:27Z) - Theoretically Grounded Framework for LLM Watermarking: A Distribution-Adaptive Approach [53.32564762183639]
We introduce a novel, unified theoretical framework for watermarking Large Language Models (LLMs)<n>Our approach aims to maximize detection performance while maintaining control over the worst-case false positive rate (FPR) and distortion on text quality.<n>We propose a distortion-free, distribution-adaptive watermarking algorithm (DAWA) that leverages a surrogate model for model-agnosticism and efficiency.
arXiv Detail & Related papers (2024-10-03T18:28:10Z) - Less is More: Sparse Watermarking in LLMs with Enhanced Text Quality [27.592486717044455]
We present a novel type of watermark, Sparse Watermark, which aims to mitigate this trade-off by applying watermarks to a small subset of generated tokens distributed across the text.
Our experimental results demonstrate that the proposed watermarking scheme achieves high detectability while generating text that outperforms previous watermarking methods in quality across various tasks.
arXiv Detail & Related papers (2024-07-17T18:52:12Z) - Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models [31.062753031312006]
Large language models generate high-quality responses with potential misinformation.
Watermarking is pivotal in this context, which involves embedding hidden markers in texts.
We introduce a novel multi-objective optimization (MOO) approach for watermarking.
Our method simultaneously achieves detectability and semantic integrity.
arXiv Detail & Related papers (2024-02-28T05:43:22Z) - New Evaluation Metrics Capture Quality Degradation due to LLM
Watermarking [28.53032132891346]
We introduce two new easy-to-use methods for evaluating watermarking algorithms for large-language models (LLMs)
Our experiments, conducted across various datasets, reveal that current watermarking methods are detectable by even simple classifiers.
Our findings underscore the trade-off between watermark robustness and text quality and highlight the importance of having more informative metrics to assess watermarking quality.
arXiv Detail & Related papers (2023-12-04T22:56:31Z) - Unbiased Watermark for Large Language Models [67.43415395591221]
This study examines how significantly watermarks impact the quality of model-generated outputs.
It is possible to integrate watermarks without affecting the output probability distribution.
The presence of watermarks does not compromise the performance of the model in downstream tasks.
arXiv Detail & Related papers (2023-09-22T12:46:38Z) - An Unforgeable Publicly Verifiable Watermark for Large Language Models [84.2805275589553]
Current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection.
We propose an unforgeable publicly verifiable watermark algorithm named UPV that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages.
arXiv Detail & Related papers (2023-07-30T13:43:27Z) - Provable Robust Watermarking for AI-Generated Text [41.5510809722375]
We propose a robust and high-quality watermark method, Unigram-Watermark.
We prove that our watermark method enjoys guaranteed generation quality, correctness in watermark detection, and is robust against text editing and paraphrasing.
arXiv Detail & Related papers (2023-06-30T07:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.