Related papers: GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick

GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick

URL: http://arxiv.org/abs/2402.12948v3
Date: Tue, 28 May 2024 04:58:41 GMT
Title: GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick
Authors: Jiayi Fu, Xuandong Zhao, Ruihan Yang, Yuansen Zhang, Jiangjie Chen, Yanghua Xiao,
Abstract summary: Large language models (LLMs) excellently generate human-like text, but also raise concerns about misuse in fake news and academic dishonesty. Decoding-based watermark, particularly the GumbelMax-trick-based watermark(GM watermark), is a standout solution for safeguarding machine-generated texts. We propose a new type of GM watermark, the Logits-Addition watermark, and its three variants, specifically designed to enhance diversity.
Score: 50.35069175236422
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) excellently generate human-like text, but also raise concerns about misuse in fake news and academic dishonesty. Decoding-based watermark, particularly the GumbelMax-trick-based watermark(GM watermark), is a standout solution for safeguarding machine-generated texts due to its notable detectability. However, GM watermark encounters a major challenge with generation diversity, always yielding identical outputs for the same prompt, negatively impacting generation diversity and user experience. To overcome this limitation, we propose a new type of GM watermark, the Logits-Addition watermark, and its three variants, specifically designed to enhance diversity. Among these, the GumbelSoft watermark (a softmax variant of the Logits-Addition watermark) demonstrates superior performance in high diversity settings, with its AUROC score outperforming those of the two alternative variants by 0.1 to 0.3 and surpassing other decoding-based watermarking methods by a minimum of 0.1.

Related papers

Improved Unbiased Watermark for Large Language Models [59.00698153097887]
We introduce MCmark, a family of unbiased, Multi-Channel-based watermarks. MCmark preserves the original distribution of the language model. It offers significant improvements in detectability and robustness over existing unbiased watermarks.
arXiv Detail & Related papers (2025-02-16T21:02:36Z)
De-mark: Watermark Removal in Large Language Models [59.00698153097887]
We present De-mark, an advanced framework designed to remove n-gram-based watermarks effectively. Our method utilizes a novel querying strategy, termed random selection probing, which aids in assessing the strength of the watermark.
arXiv Detail & Related papers (2024-10-17T17:42:10Z)
PersonaMark: Personalized LLM watermarking for model protection and user attribution [20.2735173280022]
Text watermarking is emerging as a promising solution to AI-generated text detection and model protection issues. We propose a novel text watermarking method PersonaMark that utilizes sentence structure as the hidden medium for the watermark information. Our method maintains performance with minimal perturbation to the model's behavior, allows for unbiased insertion of watermark information, and exhibits strong watermark recognition capabilities.
arXiv Detail & Related papers (2024-09-15T14:10:01Z)
Watermark Smoothing Attacks against Language Models [40.02225709485305]
We introduce the Smoothing Attack, a novel watermark removal method. We validate our attack on open-source models ranging from $1.3$B to $30$B parameters.
arXiv Detail & Related papers (2024-07-19T11:04:54Z)
Less is More: Sparse Watermarking in LLMs with Enhanced Text Quality [27.592486717044455]
We present a novel type of watermark, Sparse Watermark, which aims to mitigate this trade-off by applying watermarks to a small subset of generated tokens distributed across the text. Our experimental results demonstrate that the proposed watermarking scheme achieves high detectability while generating text that outperforms previous watermarking methods in quality across various tasks.
arXiv Detail & Related papers (2024-07-17T18:52:12Z)
Multi-Bit Distortion-Free Watermarking for Large Language Models [4.7381853007029475]
We extend an existing zero-bit distortion-free watermarking method by embedding multiple bits of meta-information as part of the watermark. We also develop a computationally efficient decoder that extracts the embedded information from the watermark with low bit error rate.
arXiv Detail & Related papers (2024-02-26T14:01:34Z)
Adaptive Text Watermark for Large Language Models [8.100123266517299]
It is challenging to generate high-quality watermarked text while maintaining strong security, robustness, and the ability to detect watermarks without prior knowledge of the prompt or model. This paper proposes an adaptive watermarking strategy to address this problem.
arXiv Detail & Related papers (2024-01-25T03:57:12Z)
On the Learnability of Watermarks for Language Models [80.97358663708592]
We ask whether language models can directly learn to generate watermarked text. We propose watermark distillation, which trains a student model to behave like a teacher model. We find that models can learn to generate watermarked text with high detectability.
arXiv Detail & Related papers (2023-12-07T17:41:44Z)
Mark My Words: Analyzing and Evaluating Language Model Watermarks [8.025719866615333]
This work focuses on output watermarking techniques, as opposed to image or model watermarks. We focus on three main metrics: quality, size (i.e., the number of tokens needed to detect a watermark), and tamper resistance.
arXiv Detail & Related papers (2023-12-01T01:22:46Z)
Who Wrote this Code? Watermarking for Code Generation [53.24895162874416]
We propose Selective WatErmarking via Entropy Thresholding (SWEET) to detect machine-generated text. Our experiments show that SWEET significantly improves code quality preservation while outperforming all baselines.
arXiv Detail & Related papers (2023-05-24T11:49:52Z)
A Watermark for Large Language Models [84.95327142027183]
We propose a watermarking framework for proprietary language models. The watermark can be embedded with negligible impact on text quality. It can be detected using an efficient open-source algorithm without access to the language model API or parameters.
arXiv Detail & Related papers (2023-01-24T18:52:59Z)
Certified Neural Network Watermarks with Randomized Smoothing [64.86178395240469]
We propose a certifiable watermarking method for deep learning models. We show that our watermark is guaranteed to be unremovable unless the model parameters are changed by more than a certain l2 threshold. Our watermark is also empirically more robust compared to previous watermarking methods.
arXiv Detail & Related papers (2022-07-16T16:06:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.