Efficiently Identifying Watermarked Segments in Mixed-Source Texts
- URL: http://arxiv.org/abs/2410.03600v1
- Date: Fri, 4 Oct 2024 16:58:41 GMT
- Title: Efficiently Identifying Watermarked Segments in Mixed-Source Texts
- Authors: Xuandong Zhao, Chenwen Liao, Yu-Xiang Wang, Lei Li,
- Abstract summary: We propose two novel methods for partial watermark detection.
First, we develop a geometry cover detection framework aimed at determining whether there is a watermark segment in long text.
Second, we introduce an adaptive online learning algorithm to pinpoint the precise location of watermark segments within the text.
- Score: 35.437251393372954
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text watermarks in large language models (LLMs) are increasingly used to detect synthetic text, mitigating misuse cases like fake news and academic dishonesty. While existing watermarking detection techniques primarily focus on classifying entire documents as watermarked or not, they often neglect the common scenario of identifying individual watermark segments within longer, mixed-source documents. Drawing inspiration from plagiarism detection systems, we propose two novel methods for partial watermark detection. First, we develop a geometry cover detection framework aimed at determining whether there is a watermark segment in long text. Second, we introduce an adaptive online learning algorithm to pinpoint the precise location of watermark segments within the text. Evaluated on three popular watermarking techniques (KGW-Watermark, Unigram-Watermark, and Gumbel-Watermark), our approach achieves high accuracy, significantly outperforming baseline methods. Moreover, our framework is adaptable to other watermarking techniques, offering new insights for precise watermark detection.
Related papers
- De-mark: Watermark Removal in Large Language Models [59.00698153097887]
We present De-mark, an advanced framework designed to remove n-gram-based watermarks effectively.
Our method utilizes a novel querying strategy, termed random selection probing, which aids in assessing the strength of the watermark.
arXiv Detail & Related papers (2024-10-17T17:42:10Z) - A Certified Robust Watermark For Large Language Models [14.944271622556778]
We propose the first certified robust watermark algorithm for large language models based on randomized smoothing.
Our algorithm can derive substantial certified robustness, which means that our watermark can not be removed even under significant alterations.
arXiv Detail & Related papers (2024-09-29T13:51:15Z) - WaterSeeker: Pioneering Efficient Detection of Watermarked Segments in Large Documents [65.11018806214388]
WaterSeeker is a novel approach to efficiently detect and locate watermarked segments amid extensive natural text.
It achieves a superior balance between detection accuracy and computational efficiency.
WaterSeeker's localization ability supports the development of interpretable AI detection systems.
arXiv Detail & Related papers (2024-09-08T14:45:47Z) - Less is More: Sparse Watermarking in LLMs with Enhanced Text Quality [27.592486717044455]
We present a novel type of watermark, Sparse Watermark, which aims to mitigate this trade-off by applying watermarks to a small subset of generated tokens distributed across the text.
Our experimental results demonstrate that the proposed watermarking scheme achieves high detectability while generating text that outperforms previous watermarking methods in quality across various tasks.
arXiv Detail & Related papers (2024-07-17T18:52:12Z) - On the Learnability of Watermarks for Language Models [80.97358663708592]
We ask whether language models can directly learn to generate watermarked text.
We propose watermark distillation, which trains a student model to behave like a teacher model.
We find that models can learn to generate watermarked text with high detectability.
arXiv Detail & Related papers (2023-12-07T17:41:44Z) - New Evaluation Metrics Capture Quality Degradation due to LLM
Watermarking [28.53032132891346]
We introduce two new easy-to-use methods for evaluating watermarking algorithms for large-language models (LLMs)
Our experiments, conducted across various datasets, reveal that current watermarking methods are detectable by even simple classifiers.
Our findings underscore the trade-off between watermark robustness and text quality and highlight the importance of having more informative metrics to assess watermarking quality.
arXiv Detail & Related papers (2023-12-04T22:56:31Z) - I Know You Did Not Write That! A Sampling Based Watermarking Method for
Identifying Machine Generated Text [0.0]
We propose a new watermarking method to detect machine-generated texts.
Our method embeds a unique pattern within the generated text.
We show how watermarking affects textual quality and compare our proposed method with a state-of-the-art watermarking method.
arXiv Detail & Related papers (2023-11-29T20:04:57Z) - On the Reliability of Watermarks for Large Language Models [95.87476978352659]
We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document.
We find that watermarks remain detectable even after human and machine paraphrasing.
We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document.
arXiv Detail & Related papers (2023-06-07T17:58:48Z) - Certified Neural Network Watermarks with Randomized Smoothing [64.86178395240469]
We propose a certifiable watermarking method for deep learning models.
We show that our watermark is guaranteed to be unremovable unless the model parameters are changed by more than a certain l2 threshold.
Our watermark is also empirically more robust compared to previous watermarking methods.
arXiv Detail & Related papers (2022-07-16T16:06:59Z) - Split then Refine: Stacked Attention-guided ResUNets for Blind Single
Image Visible Watermark Removal [69.92767260794628]
Previous watermark removal methods require to gain the watermark location from users or train a multi-task network to recover the background indiscriminately.
We propose a novel two-stage framework with a stacked attention-guided ResUNets to simulate the process of detection, removal and refinement.
We extensively evaluate our algorithm over four different datasets under various settings and the experiments show that our approach outperforms other state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2020-12-13T09:05:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.