Related papers: Position: LLM Watermarking Should Align Stakeholders' Incentives for Practical Adoption

Position: LLM Watermarking Should Align Stakeholders' Incentives for Practical Adoption

URL: http://arxiv.org/abs/2510.18333v1
Date: Tue, 21 Oct 2025 06:34:51 GMT
Title: Position: LLM Watermarking Should Align Stakeholders' Incentives for Practical Adoption
Authors: Yepeng Liu, Xuandong Zhao, Dawn Song, Gregory W. Wornell, Yuheng Bu,
Abstract summary: We revisit three classes of watermarking through this lens.<n>emphLLM text watermarking offers modest provider benefit when framed solely as an anti-misuse tool.<n>emphIn-context watermarking (ICW) is tailored for trusted parties, such as conference organizers or educators.
Score: 94.887133335656
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Despite progress in watermarking algorithms for large language models (LLMs), real-world deployment remains limited. We argue that this gap stems from misaligned incentives among LLM providers, platforms, and end users, which manifest as four key barriers: competitive risk, detection-tool governance, robustness concerns and attribution issues. We revisit three classes of watermarking through this lens. \emph{Model watermarking} naturally aligns with LLM provider interests, yet faces new challenges in open-source ecosystems. \emph{LLM text watermarking} offers modest provider benefit when framed solely as an anti-misuse tool, but can gain traction in narrowly scoped settings such as dataset de-contamination or user-controlled provenance. \emph{In-context watermarking} (ICW) is tailored for trusted parties, such as conference organizers or educators, who embed hidden watermarking instructions into documents. If a dishonest reviewer or student submits this text to an LLM, the output carries a detectable watermark indicating misuse. This setup aligns incentives: users experience no quality loss, trusted parties gain a detection tool, and LLM providers remain neutral by simply following watermark instructions. We advocate for a broader exploration of incentive-aligned methods, with ICW as an example, in domains where trusted parties need reliable tools to detect misuse. More broadly, we distill design principles for incentive-aligned, domain-specific watermarking and outline future research directions. Our position is that the practical adoption of LLM watermarking requires aligning stakeholder incentives in targeted application domains and fostering active community engagement.

Related papers

SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking [58.475471437150674]
We propose sequential watermarking for soft prompts (SWAP)<n>SWAP encodes watermarks through a specific order of defender-specified out-of-distribution classes.<n>Experiments on 11 datasets demonstrate SWAP's effectiveness, harmlessness, and robustness against potential adaptive attacks.
arXiv Detail & Related papers (2025-11-05T13:48:48Z)
Learning to Watermark: A Selective Watermarking Framework for Large Language Models via Multi-Objective Optimization [17.15048594237333]
Existing watermarking techniques often face trade-off between watermark detectability and generated text quality.<n>In this paper, we introduce Learning to Watermark (LTW), a novel selective watermarking framework.
arXiv Detail & Related papers (2025-10-13T01:07:38Z)
In-Context Watermarks for Large Language Models [71.29952527565749]
In-Context Watermarking (ICW) embeds watermarks into generated text solely through prompt engineering.<n>We investigate four ICW strategies at different levels of granularity, each paired with a tailored detection method.<n>Our experiments validate the feasibility of ICW as a model-agnostic, practical watermarking approach.
arXiv Detail & Related papers (2025-05-22T17:24:51Z)
Mark Your LLM: Detecting the Misuse of Open-Source Large Language Models via Watermarking [40.951792492059646]
This work defines two misuse scenarios for open-source large language models (LLMs)<n>We explore the application of inference-time watermark distillation and backdoor watermarking in these contexts.<n>Our experiments reveal that backdoor watermarking could effectively detect IP Violation, while inference-time watermark distillation is applicable in both scenarios.
arXiv Detail & Related papers (2025-03-06T17:24:06Z)
Breaking Distortion-free Watermarks in Large Language Models [17.56485872604935]
There are growing concerns that the current LLM watermarking schemes are vulnerable to expert adversaries.<n>We propose using adaptive prompting and a sorting-based algorithm to accurately recover the underlying secret key for watermarking the LLM.
arXiv Detail & Related papers (2025-02-25T19:52:55Z)
MarkLLM: An Open-Source Toolkit for LLM Watermarking [80.00466284110269]
MarkLLM is an open-source toolkit for implementing LLM watermarking algorithms. For evaluation, MarkLLM offers a comprehensive suite of 12 tools spanning three perspectives, along with two types of automated evaluation pipelines.
arXiv Detail & Related papers (2024-05-16T12:40:01Z)
WatME: Towards Lossless Watermarking Through Lexical Redundancy [58.61972059246715]
This study assesses the impact of watermarking on different capabilities of large language models (LLMs) from a cognitive science lens. We introduce Watermarking with Mutual Exclusion (WatME) to seamlessly integrate watermarks.
arXiv Detail & Related papers (2023-11-16T11:58:31Z)
Unbiased Watermark for Large Language Models [67.43415395591221]
This study examines how significantly watermarks impact the quality of model-generated outputs. It is possible to integrate watermarks without affecting the output probability distribution. The presence of watermarks does not compromise the performance of the model in downstream tasks.
arXiv Detail & Related papers (2023-09-22T12:46:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.