Related papers: Segmenting Watermarked Texts From Language Models

Segmenting Watermarked Texts From Language Models

URL: http://arxiv.org/abs/2410.20670v1
Date: Mon, 28 Oct 2024 02:05:10 GMT
Title: Segmenting Watermarked Texts From Language Models
Authors: Xingchi Li, Guanxun Li, Xianyang Zhang,
Abstract summary: This work focuses on a scenario where an untrusted third-party user sends prompts to a trusted language model (LLM) provider, who then generates a text with a watermark. This setup makes it possible for a detector to later identify the source of the text if the user publishes it. We propose a methodology to segment the published text into watermarked and non-watermarked sub-strings.
Score: 1.4103505579327706
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Watermarking is a technique that involves embedding nearly unnoticeable statistical signals within generated content to help trace its source. This work focuses on a scenario where an untrusted third-party user sends prompts to a trusted language model (LLM) provider, who then generates a text from their LLM with a watermark. This setup makes it possible for a detector to later identify the source of the text if the user publishes it. The user can modify the generated text by substitutions, insertions, or deletions. Our objective is to develop a statistical method to detect if a published text is LLM-generated from the perspective of a detector. We further propose a methodology to segment the published text into watermarked and non-watermarked sub-strings. The proposed approach is built upon randomization tests and change point detection techniques. We demonstrate that our method ensures Type I and Type II error control and can accurately identify watermarked sub-strings by finding the corresponding change point locations. To validate our technique, we apply it to texts generated by several language models with prompts extracted from Google's C4 dataset and obtain encouraging numerical results. We release all code publicly at https://github.com/doccstat/llm-watermark-cpd.

Related papers

Modification and Generated-Text Detection: Achieving Dual Detection Capabilities for the Outputs of LLM by Watermark [6.355836060419373]
One practical solution is to embed a watermark in the text, allowing ownership verification through watermark extraction. Existing methods primarily focus on defending against modification attacks, often neglecting other spoofing attacks. We propose a technique to detect modifications in text for unbiased watermark which is sensitive to modification.
arXiv Detail & Related papers (2025-02-12T11:56:40Z)
Provably Robust Multi-bit Watermarking for AI-generated Text [37.21416140194606]
Large Language Models (LLMs) have demonstrated remarkable capabilities of generating texts resembling human language. They can be misused by criminals to create deceptive content, such as fake news and phishing emails. Watermarking is a key technique to address these concerns, which embeds a message into a text.
arXiv Detail & Related papers (2024-01-30T08:46:48Z)
I Know You Did Not Write That! A Sampling Based Watermarking Method for Identifying Machine Generated Text [0.0]
We propose a new watermarking method to detect machine-generated texts. Our method embeds a unique pattern within the generated text. We show how watermarking affects textual quality and compare our proposed method with a state-of-the-art watermarking method.
arXiv Detail & Related papers (2023-11-29T20:04:57Z)
An Unforgeable Publicly Verifiable Watermark for Large Language Models [84.2805275589553]
Current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection. We propose an unforgeable publicly verifiable watermark algorithm named UPV that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages.
arXiv Detail & Related papers (2023-07-30T13:43:27Z)
Towards Codable Watermarking for Injecting Multi-bits Information to LLMs [86.86436777626959]
Large language models (LLMs) generate texts with increasing fluency and realism. Existing watermarking methods are encoding-inefficient and cannot flexibly meet the diverse information encoding needs. We propose Codable Text Watermarking for LLMs (CTWL) that allows text watermarks to carry multi-bit customizable information.
arXiv Detail & Related papers (2023-07-29T14:11:15Z)
On the Reliability of Watermarks for Large Language Models [95.87476978352659]
We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document. We find that watermarks remain detectable even after human and machine paraphrasing. We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document.
arXiv Detail & Related papers (2023-06-07T17:58:48Z)
Watermarking Text Generated by Black-Box Language Models [103.52541557216766]
A watermark-based method was proposed for white-box LLMs, allowing them to embed watermarks during text generation. A detection algorithm aware of the list can identify the watermarked text. We develop a watermarking framework for black-box language model usage scenarios.
arXiv Detail & Related papers (2023-05-14T07:37:33Z)
Can AI-Generated Text be Reliably Detected? [54.670136179857344]
Unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc. Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques. In this paper, we show that these detectors are not reliable in practical scenarios.
arXiv Detail & Related papers (2023-03-17T17:53:19Z)
A Watermark for Large Language Models [84.95327142027183]
We propose a watermarking framework for proprietary language models. The watermark can be embedded with negligible impact on text quality. It can be detected using an efficient open-source algorithm without access to the language model API or parameters.
arXiv Detail & Related papers (2023-01-24T18:52:59Z)
Tracing Text Provenance via Context-Aware Lexical Substitution [81.49359106648735]
We propose a natural language watermarking scheme based on context-aware lexical substitution. Under both objective and subjective metrics, our watermarking scheme can well preserve the semantic integrity of original sentences.
arXiv Detail & Related papers (2021-12-15T04:27:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.