Adversarial Watermarking Transformer: Towards Tracing Text Provenance
with Data Hiding
- URL: http://arxiv.org/abs/2009.03015v2
- Date: Mon, 29 Mar 2021 12:21:27 GMT
- Title: Adversarial Watermarking Transformer: Towards Tracing Text Provenance
with Data Hiding
- Authors: Sahar Abdelnabi and Mario Fritz
- Abstract summary: We study natural language watermarking as a defense to help better mark and trace the provenance of text.
We introduce the Adversarial Watermarking Transformer (AWT) with a jointly trained encoder-decoder and adversarial training.
AWT is the first end-to-end model to hide data in text by automatically learning -- without ground truth -- word substitutions along with their locations.
- Score: 80.3811072650087
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in natural language generation have introduced powerful
language models with high-quality output text. However, this raises concerns
about the potential misuse of such models for malicious purposes. In this
paper, we study natural language watermarking as a defense to help better mark
and trace the provenance of text. We introduce the Adversarial Watermarking
Transformer (AWT) with a jointly trained encoder-decoder and adversarial
training that, given an input text and a binary message, generates an output
text that is unobtrusively encoded with the given message. We further study
different training and inference strategies to achieve minimal changes to the
semantics and correctness of the input text.
AWT is the first end-to-end model to hide data in text by automatically
learning -- without ground truth -- word substitutions along with their
locations in order to encode the message. We empirically show that our model is
effective in largely preserving text utility and decoding the watermark while
hiding its presence against adversaries. Additionally, we demonstrate that our
method is robust against a range of attacks.
Related papers
- Signal Watermark on Large Language Models [28.711745671275477]
We propose a watermarking method embedding a specific watermark into the text during its generation by Large Language Models (LLMs)
This technique not only ensures the watermark's invisibility to humans but also maintains the quality and grammatical integrity of model-generated text.
Our method has been empirically validated across multiple LLMs, consistently maintaining high detection accuracy.
arXiv Detail & Related papers (2024-10-09T04:49:03Z) - Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts.
We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z) - Adaptive Text Watermark for Large Language Models [8.100123266517299]
It is challenging to generate high-quality watermarked text while maintaining strong security, robustness, and the ability to detect watermarks without prior knowledge of the prompt or model.
This paper proposes an adaptive watermarking strategy to address this problem.
arXiv Detail & Related papers (2024-01-25T03:57:12Z) - Improving the Generation Quality of Watermarked Large Language Models
via Word Importance Scoring [81.62249424226084]
Token-level watermarking inserts watermarks in the generated texts by altering the token probability distributions.
This watermarking algorithm alters the logits during generation, which can lead to a downgraded text quality.
We propose to improve the quality of texts generated by a watermarked language model by Watermarking with Importance Scoring (WIS)
arXiv Detail & Related papers (2023-11-16T08:36:00Z) - Towards Codable Watermarking for Injecting Multi-bits Information to LLMs [86.86436777626959]
Large language models (LLMs) generate texts with increasing fluency and realism.
Existing watermarking methods are encoding-inefficient and cannot flexibly meet the diverse information encoding needs.
We propose Codable Text Watermarking for LLMs (CTWL) that allows text watermarks to carry multi-bit customizable information.
arXiv Detail & Related papers (2023-07-29T14:11:15Z) - Watermarking Conditional Text Generation for AI Detection: Unveiling
Challenges and a Semantic-Aware Watermark Remedy [52.765898203824975]
We introduce a semantic-aware watermarking algorithm that considers the characteristics of conditional text generation and the input context.
Experimental results demonstrate that our proposed method yields substantial improvements across various text generation models.
arXiv Detail & Related papers (2023-07-25T20:24:22Z) - DeepTextMark: A Deep Learning-Driven Text Watermarking Approach for
Identifying Large Language Model Generated Text [1.249418440326334]
The importance of discerning whether texts are human-authored or generated by Large Language Models has become paramount.
DeepTextMark offers a viable "add-on" solution to prevailing text generation frameworks, requiring no direct access or alterations to the underlying text generation mechanism.
Experimental evaluations underscore the high imperceptibility, elevated detection accuracy, augmented robustness, reliability, and swift execution of DeepTextMark.
arXiv Detail & Related papers (2023-05-09T21:31:07Z) - Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models.
We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.