Cross-Attention Watermarking of Large Language Models
- URL: http://arxiv.org/abs/2401.06829v1
- Date: Fri, 12 Jan 2024 09:39:50 GMT
- Title: Cross-Attention Watermarking of Large Language Models
- Authors: Folco Bertini Baldassini, Huy H. Nguyen, Ching-Chung Chang, Isao
Echizen
- Abstract summary: New approach to linguistic watermarking of language models is presented.
Information is imperceptibly inserted into the output text while preserving its readability and original meaning.
Cross-attention mechanism is used to embed watermarks in the text during inference.
- Score: 8.704964543257246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A new approach to linguistic watermarking of language models is presented in
which information is imperceptibly inserted into the output text while
preserving its readability and original meaning. A cross-attention mechanism is
used to embed watermarks in the text during inference. Two methods using
cross-attention are presented that minimize the effect of watermarking on the
performance of a pretrained model. Exploration of different training strategies
for optimizing the watermarking and of the challenges and implications of
applying this approach in real-world scenarios clarified the tradeoff between
watermark robustness and text quality. Watermark selection substantially
affects the generated output for high entropy sentences. This proactive
watermarking approach has potential application in future model development.
Related papers
- Improved Unbiased Watermark for Large Language Models [59.00698153097887]
We introduce MCmark, a family of unbiased, Multi-Channel-based watermarks.
MCmark preserves the original distribution of the language model.
It offers significant improvements in detectability and robustness over existing unbiased watermarks.
arXiv Detail & Related papers (2025-02-16T21:02:36Z) - BiMarker: Enhancing Text Watermark Detection for Large Language Models with Bipolar Watermarks [19.689433249830465]
Existing watermarking techniques struggle with low watermark strength and stringent false-positive requirements.
tool splits generated text into positive and negative poles, enhancing detection without requiring additional computational resources.
arXiv Detail & Related papers (2025-01-21T14:32:50Z) - De-mark: Watermark Removal in Large Language Models [59.00698153097887]
We present De-mark, an advanced framework designed to remove n-gram-based watermarks effectively.
Our method utilizes a novel querying strategy, termed random selection probing, which aids in assessing the strength of the watermark.
arXiv Detail & Related papers (2024-10-17T17:42:10Z) - Theoretically Grounded Framework for LLM Watermarking: A Distribution-Adaptive Approach [35.319577498993354]
We present a novel theoretical framework for watermarking Large Language Models (LLMs)
Our approach focuses on maximizing detection performance while maintaining control over the worst-case Type-I error and text distortion.
We propose an efficient, model-agnostic, distribution-adaptive watermarking algorithm, utilizing a surrogate model alongside the Gumbel-max trick.
arXiv Detail & Related papers (2024-10-03T18:28:10Z) - Duwak: Dual Watermarks in Large Language Models [49.00264962860555]
We propose, Duwak, to enhance the efficiency and quality of watermarking by embedding dual secret patterns in both token probability distribution and sampling schemes.
We evaluate Duwak extensively on Llama2, against four state-of-the-art watermarking techniques and combinations of them.
arXiv Detail & Related papers (2024-03-12T16:25:38Z) - Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models [31.062753031312006]
Large language models generate high-quality responses with potential misinformation.
Watermarking is pivotal in this context, which involves embedding hidden markers in texts.
We introduce a novel multi-objective optimization (MOO) approach for watermarking.
Our method simultaneously achieves detectability and semantic integrity.
arXiv Detail & Related papers (2024-02-28T05:43:22Z) - Adaptive Text Watermark for Large Language Models [8.100123266517299]
It is challenging to generate high-quality watermarked text while maintaining strong security, robustness, and the ability to detect watermarks without prior knowledge of the prompt or model.
This paper proposes an adaptive watermarking strategy to address this problem.
arXiv Detail & Related papers (2024-01-25T03:57:12Z) - Improving the Generation Quality of Watermarked Large Language Models
via Word Importance Scoring [81.62249424226084]
Token-level watermarking inserts watermarks in the generated texts by altering the token probability distributions.
This watermarking algorithm alters the logits during generation, which can lead to a downgraded text quality.
We propose to improve the quality of texts generated by a watermarked language model by Watermarking with Importance Scoring (WIS)
arXiv Detail & Related papers (2023-11-16T08:36:00Z) - A Resilient and Accessible Distribution-Preserving Watermark for Large Language Models [65.40460716619772]
Our research focuses on the importance of a textbfDistribution-textbfPreserving (DiP) watermark.
Contrary to the current strategies, our proposed DiPmark simultaneously preserves the original token distribution during watermarking.
It is detectable without access to the language model API and prompts (accessible), and is provably robust to moderate changes of tokens.
arXiv Detail & Related papers (2023-10-11T17:57:35Z) - Unbiased Watermark for Large Language Models [67.43415395591221]
This study examines how significantly watermarks impact the quality of model-generated outputs.
It is possible to integrate watermarks without affecting the output probability distribution.
The presence of watermarks does not compromise the performance of the model in downstream tasks.
arXiv Detail & Related papers (2023-09-22T12:46:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.