Related papers: Cross-Attention Watermarking of Large Language Models

Cross-Attention Watermarking of Large Language Models

URL: http://arxiv.org/abs/2401.06829v1
Date: Fri, 12 Jan 2024 09:39:50 GMT
Title: Cross-Attention Watermarking of Large Language Models
Authors: Folco Bertini Baldassini, Huy H. Nguyen, Ching-Chung Chang, Isao Echizen
Abstract summary: New approach to linguistic watermarking of language models is presented. Information is imperceptibly inserted into the output text while preserving its readability and original meaning. Cross-attention mechanism is used to embed watermarks in the text during inference.
Score: 8.704964543257246
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A new approach to linguistic watermarking of language models is presented in which information is imperceptibly inserted into the output text while preserving its readability and original meaning. A cross-attention mechanism is used to embed watermarks in the text during inference. Two methods using cross-attention are presented that minimize the effect of watermarking on the performance of a pretrained model. Exploration of different training strategies for optimizing the watermarking and of the challenges and implications of applying this approach in real-world scenarios clarified the tradeoff between watermark robustness and text quality. Watermark selection substantially affects the generated output for high entropy sentences. This proactive watermarking approach has potential application in future model development.

Related papers

Can you Finetune your Binoculars? Embedding Text Watermarks into the Weights of Large Language Models [33.051248579713736]
Indistinguishability of AI-generated content from human text raises challenges in transparency and accountability. We propose a strategy to finetune a pair of low-rank adapters of a model, one serving as the text-generating model, and the other as the detector. In this way, the watermarking strategy is fully learned end-to-end.
arXiv Detail & Related papers (2025-04-08T21:34:02Z)
Improved Unbiased Watermark for Large Language Models [59.00698153097887]
We introduce MCmark, a family of unbiased, Multi-Channel-based watermarks. MCmark preserves the original distribution of the language model. It offers significant improvements in detectability and robustness over existing unbiased watermarks.
arXiv Detail & Related papers (2025-02-16T21:02:36Z)
BiMarker: Enhancing Text Watermark Detection for Large Language Models with Bipolar Watermarks [19.689433249830465]
Existing watermarking techniques struggle with low watermark strength and stringent false-positive requirements. tool splits generated text into positive and negative poles, enhancing detection without requiring additional computational resources.
arXiv Detail & Related papers (2025-01-21T14:32:50Z)
De-mark: Watermark Removal in Large Language Models [59.00698153097887]
We present De-mark, an advanced framework designed to remove n-gram-based watermarks effectively. Our method utilizes a novel querying strategy, termed random selection probing, which aids in assessing the strength of the watermark.
arXiv Detail & Related papers (2024-10-17T17:42:10Z)
Theoretically Grounded Framework for LLM Watermarking: A Distribution-Adaptive Approach [35.319577498993354]
We present a novel theoretical framework for watermarking Large Language Models (LLMs) Our approach focuses on maximizing detection performance while maintaining control over the worst-case Type-I error and text distortion. We propose an efficient, model-agnostic, distribution-adaptive watermarking algorithm, utilizing a surrogate model alongside the Gumbel-max trick.
arXiv Detail & Related papers (2024-10-03T18:28:10Z)
Watermark Smoothing Attacks against Language Models [40.02225709485305]
We introduce smoothing attacks and show that existing watermarking methods are not robust against minor modifications of text. Our attack reveals a fundamental limitation of a wide range of watermarking techniques.
arXiv Detail & Related papers (2024-07-19T11:04:54Z)
Duwak: Dual Watermarks in Large Language Models [49.00264962860555]
We propose, Duwak, to enhance the efficiency and quality of watermarking by embedding dual secret patterns in both token probability distribution and sampling schemes. We evaluate Duwak extensively on Llama2, against four state-of-the-art watermarking techniques and combinations of them.
arXiv Detail & Related papers (2024-03-12T16:25:38Z)
Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models [31.062753031312006]
Large language models generate high-quality responses with potential misinformation. Watermarking is pivotal in this context, which involves embedding hidden markers in texts. We introduce a novel multi-objective optimization (MOO) approach for watermarking. Our method simultaneously achieves detectability and semantic integrity.
arXiv Detail & Related papers (2024-02-28T05:43:22Z)
Adaptive Text Watermark for Large Language Models [8.100123266517299]
It is challenging to generate high-quality watermarked text while maintaining strong security, robustness, and the ability to detect watermarks without prior knowledge of the prompt or model. This paper proposes an adaptive watermarking strategy to address this problem.
arXiv Detail & Related papers (2024-01-25T03:57:12Z)
WatME: Towards Lossless Watermarking Through Lexical Redundancy [58.61972059246715]
This study assesses the impact of watermarking on different capabilities of large language models (LLMs) from a cognitive science lens. We introduce Watermarking with Mutual Exclusion (WatME) to seamlessly integrate watermarks.
arXiv Detail & Related papers (2023-11-16T11:58:31Z)
Improving the Generation Quality of Watermarked Large Language Models via Word Importance Scoring [81.62249424226084]
Token-level watermarking inserts watermarks in the generated texts by altering the token probability distributions. This watermarking algorithm alters the logits during generation, which can lead to a downgraded text quality. We propose to improve the quality of texts generated by a watermarked language model by Watermarking with Importance Scoring (WIS)
arXiv Detail & Related papers (2023-11-16T08:36:00Z)
A Resilient and Accessible Distribution-Preserving Watermark for Large Language Models [65.40460716619772]
Our research focuses on the importance of a textbfDistribution-textbfPreserving (DiP) watermark. Contrary to the current strategies, our proposed DiPmark simultaneously preserves the original token distribution during watermarking. It is detectable without access to the language model API and prompts (accessible), and is provably robust to moderate changes of tokens.
arXiv Detail & Related papers (2023-10-11T17:57:35Z)
Unbiased Watermark for Large Language Models [67.43415395591221]
This study examines how significantly watermarks impact the quality of model-generated outputs. It is possible to integrate watermarks without affecting the output probability distribution. The presence of watermarks does not compromise the performance of the model in downstream tasks.
arXiv Detail & Related papers (2023-09-22T12:46:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.