Watermarking Pre-trained Language Models with Backdooring
- URL: http://arxiv.org/abs/2210.07543v1
- Date: Fri, 14 Oct 2022 05:42:39 GMT
- Title: Watermarking Pre-trained Language Models with Backdooring
- Authors: Chenxi Gu, Chengsong Huang, Xiaoqing Zheng, Kai-Wei Chang, Cho-Jui
Hsieh
- Abstract summary: We show that PLMs can be watermarked with a multi-task learning framework by embedding backdoors triggered by specific inputs defined by the owners.
In addition to using some rare words as triggers, we also show that the combination of common words can be used as backdoor triggers to avoid them being easily detected.
- Score: 118.14981787949199
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large pre-trained language models (PLMs) have proven to be a crucial
component of modern natural language processing systems. PLMs typically need to
be fine-tuned on task-specific downstream datasets, which makes it hard to
claim the ownership of PLMs and protect the developer's intellectual property
due to the catastrophic forgetting phenomenon. We show that PLMs can be
watermarked with a multi-task learning framework by embedding backdoors
triggered by specific inputs defined by the owners, and those watermarks are
hard to remove even though the watermarked PLMs are fine-tuned on multiple
downstream tasks. In addition to using some rare words as triggers, we also
show that the combination of common words can be used as backdoor triggers to
avoid them being easily detected. Extensive experiments on multiple datasets
demonstrate that the embedded watermarks can be robustly extracted with a high
success rate and less influenced by the follow-up fine-tuning.
Related papers
- Large Language Model Watermark Stealing With Mixed Integer Programming [51.336009662771396]
Large Language Model (LLM) watermark shows promise in addressing copyright, monitoring AI-generated text, and preventing its misuse.
Recent research indicates that watermarking methods using numerous keys are susceptible to removal attacks.
We propose a novel green list stealing attack against the state-of-the-art LLM watermark scheme.
arXiv Detail & Related papers (2024-05-30T04:11:17Z) - WatME: Towards Lossless Watermarking Through Lexical Redundancy [58.61972059246715]
This study assesses the impact of watermarking on different capabilities of large language models (LLMs) from a cognitive science lens.
We introduce Watermarking with Mutual Exclusion (WatME) to seamlessly integrate watermarks.
arXiv Detail & Related papers (2023-11-16T11:58:31Z) - Performance Trade-offs of Watermarking Large Language Models [28.556397738117617]
We evaluate the performance of watermarked large language models (LLMs) on a diverse suite of tasks.
We find that watermarking has negligible impact on the performance of tasks posed as k-class classification problems.
For long-form generation tasks, including summarization and translation, we see a drop of 15-20% in the performance due to watermarking.
arXiv Detail & Related papers (2023-11-16T11:44:58Z) - A Robust Semantics-based Watermark for Large Language Model against Paraphrasing [50.84892876636013]
Large language models (LLMs) have show great ability in various natural language tasks.
There are concerns that LLMs are possible to be used improperly or even illegally.
We propose a semantics-based watermark framework SemaMark.
arXiv Detail & Related papers (2023-11-15T06:19:02Z) - On the Safety of Open-Sourced Large Language Models: Does Alignment
Really Prevent Them From Being Misused? [49.99955642001019]
We show that open-sourced, aligned large language models could be easily misguided to generate undesired content.
Our key idea is to directly manipulate the generation process of open-sourced LLMs to misguide it to generate undesired content.
arXiv Detail & Related papers (2023-10-02T19:22:01Z) - Are You Copying My Model? Protecting the Copyright of Large Language
Models for EaaS via Backdoor Watermark [58.60940048748815]
Companies have begun to offer Embedding as a Service (E) based on large language models (LLMs)
E is vulnerable to model extraction attacks, which can cause significant losses for the owners of LLMs.
We propose an Embedding Watermark method called EmbMarker that implants backdoors on embeddings.
arXiv Detail & Related papers (2023-05-17T08:28:54Z) - CLOWER: A Pre-trained Language Model with Contrastive Learning over Word
and Character Representations [18.780841483220986]
Pre-trained Language Models (PLMs) have achieved remarkable performance gains across numerous downstream tasks in natural language understanding.
Most current models use Chinese characters as inputs and are not able to encode semantic information contained in Chinese words.
We propose a simple yet effective PLM CLOWER, which adopts the Contrastive Learning Over Word and charactER representations.
arXiv Detail & Related papers (2022-08-23T09:52:34Z) - Removing Backdoor-Based Watermarks in Neural Networks with Limited Data [26.050649487499626]
Trading deep models is highly demanded and lucrative nowadays.
naive trading schemes typically involve potential risks related to copyright and trustworthiness issues.
We propose a novel backdoor-based watermark removal framework using limited data, dubbed WILD.
arXiv Detail & Related papers (2020-08-02T06:25:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.