Adversarial Watermarking Transformer: Towards Tracing Text Provenance
with Data Hiding
- URL: http://arxiv.org/abs/2009.03015v2
- Date: Mon, 29 Mar 2021 12:21:27 GMT
- Title: Adversarial Watermarking Transformer: Towards Tracing Text Provenance
with Data Hiding
- Authors: Sahar Abdelnabi and Mario Fritz
- Abstract summary: We study natural language watermarking as a defense to help better mark and trace the provenance of text.
We introduce the Adversarial Watermarking Transformer (AWT) with a jointly trained encoder-decoder and adversarial training.
AWT is the first end-to-end model to hide data in text by automatically learning -- without ground truth -- word substitutions along with their locations.
- Score: 80.3811072650087
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in natural language generation have introduced powerful
language models with high-quality output text. However, this raises concerns
about the potential misuse of such models for malicious purposes. In this
paper, we study natural language watermarking as a defense to help better mark
and trace the provenance of text. We introduce the Adversarial Watermarking
Transformer (AWT) with a jointly trained encoder-decoder and adversarial
training that, given an input text and a binary message, generates an output
text that is unobtrusively encoded with the given message. We further study
different training and inference strategies to achieve minimal changes to the
semantics and correctness of the input text.
AWT is the first end-to-end model to hide data in text by automatically
learning -- without ground truth -- word substitutions along with their
locations in order to encode the message. We empirically show that our model is
effective in largely preserving text utility and decoding the watermark while
hiding its presence against adversaries. Additionally, we demonstrate that our
method is robust against a range of attacks.
Related papers
- Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - Adaptive Text Watermark for Large Language Models [8.100123266517299]
It is challenging to generate high-quality watermarked text while maintaining strong security, robustness, and the ability to detect watermarks without prior knowledge of the prompt or model.
This paper proposes an adaptive watermarking strategy to address this problem.
arXiv Detail & Related papers (2024-01-25T03:57:12Z) - Improving the Generation Quality of Watermarked Large Language Models
via Word Importance Scoring [81.62249424226084]
Token-level watermarking inserts watermarks in the generated texts by altering the token probability distributions.
This watermarking algorithm alters the logits during generation, which can lead to a downgraded text quality.
We propose to improve the quality of texts generated by a watermarked language model by Watermarking with Importance Scoring (WIS)
arXiv Detail & Related papers (2023-11-16T08:36:00Z) - Towards Codable Watermarking for Injecting Multi-bits Information to LLMs [86.86436777626959]
Large language models (LLMs) generate texts with increasing fluency and realism.
Existing watermarking methods are encoding-inefficient and cannot flexibly meet the diverse information encoding needs.
We propose Codable Text Watermarking for LLMs (CTWL) that allows text watermarks to carry multi-bit customizable information.
arXiv Detail & Related papers (2023-07-29T14:11:15Z) - Watermarking Conditional Text Generation for AI Detection: Unveiling
Challenges and a Semantic-Aware Watermark Remedy [52.765898203824975]
We introduce a semantic-aware watermarking algorithm that considers the characteristics of conditional text generation and the input context.
Experimental results demonstrate that our proposed method yields substantial improvements across various text generation models.
arXiv Detail & Related papers (2023-07-25T20:24:22Z) - DeepTextMark: A Deep Learning-Driven Text Watermarking Approach for
Identifying Large Language Model Generated Text [1.249418440326334]
The importance of discerning whether texts are human-authored or generated by Large Language Models has become paramount.
DeepTextMark offers a viable "add-on" solution to prevailing text generation frameworks, requiring no direct access or alterations to the underlying text generation mechanism.
Experimental evaluations underscore the high imperceptibility, elevated detection accuracy, augmented robustness, reliability, and swift execution of DeepTextMark.
arXiv Detail & Related papers (2023-05-09T21:31:07Z) - Can AI-Generated Text be Reliably Detected? [54.670136179857344]
Unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc.
Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques.
In this paper, we show that these detectors are not reliable in practical scenarios.
arXiv Detail & Related papers (2023-03-17T17:53:19Z) - Generating Natural Language Adversarial Examples on a Large Scale with
Generative Models [41.85006993382117]
We propose an end to end solution to efficiently generate adversarial texts from scratch using generative models.
Specifically, we train a conditional variational autoencoder with an additional adversarial loss to guide the generation of adversarial examples.
To improve the validity of adversarial texts, we utilize discrimators and the training framework of generative adversarial networks.
arXiv Detail & Related papers (2020-03-10T03:21:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.