Related papers: Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark

Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark

URL: http://arxiv.org/abs/2305.10036v3
Date: Fri, 2 Jun 2023 06:56:29 GMT
Title: Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark
Authors: Wenjun Peng, Jingwei Yi, Fangzhao Wu, Shangxi Wu, Bin Zhu, Lingjuan Lyu, Binxing Jiao, Tong Xu, Guangzhong Sun, Xing Xie
Abstract summary: Companies have begun to offer Embedding as a Service (E) based on large language models (LLMs) E is vulnerable to model extraction attacks, which can cause significant losses for the owners of LLMs. We propose an Embedding Watermark method called EmbMarker that implants backdoors on embeddings.
Score: 58.60940048748815
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have demonstrated powerful capabilities in both text understanding and generation. Companies have begun to offer Embedding as a Service (EaaS) based on these LLMs, which can benefit various natural language processing (NLP) tasks for customers. However, previous studies have shown that EaaS is vulnerable to model extraction attacks, which can cause significant losses for the owners of LLMs, as training these models is extremely expensive. To protect the copyright of LLMs for EaaS, we propose an Embedding Watermark method called EmbMarker that implants backdoors on embeddings. Our method selects a group of moderate-frequency words from a general text corpus to form a trigger set, then selects a target embedding as the watermark, and inserts it into the embeddings of texts containing trigger words as the backdoor. The weight of insertion is proportional to the number of trigger words included in the text. This allows the watermark backdoor to be effectively transferred to EaaS-stealer's model for copyright verification while minimizing the adverse impact on the original embeddings' utility. Our extensive experiments on various datasets show that our method can effectively protect the copyright of EaaS models without compromising service quality.

Related papers

DuFFin: A Dual-Level Fingerprinting Framework for LLMs IP Protection [9.849635250118913]
Large language models (LLMs) are considered valuable Intellectual Properties (IP) for legitimate owners.<n>We propose DuFFin, a novel $textbfDu$al-Level $textbfFin$gerprinting $textbfF$ramework for black-box setting ownership verification.
arXiv Detail & Related papers (2025-05-22T11:16:46Z)
Your Language Model Can Secretly Write Like Humans: Contrastive Paraphrase Attacks on LLM-Generated Text Detectors [65.27124213266491]
We propose textbfContrastive textbfParaphrase textbfAttack (CoPA), a training-free method that effectively deceives text detectors.<n>CoPA constructs an auxiliary machine-like word distribution as a contrast to the human-like distribution generated by large language models.<n>Our theoretical analysis suggests the superiority of the proposed attack.
arXiv Detail & Related papers (2025-05-21T10:08:39Z)
SUV: Scalable Large Language Model Copyright Compliance with Regularized Selective Unlearning [22.76025238218253]
SUV is a selective unlearning framework designed to prevent Large Language Models from memorizing copyrighted content. We replace verbatim copyrighted content with plausible and coherent alternatives. We validate our approach using a large-scale dataset of 500 famous books.
arXiv Detail & Related papers (2025-03-29T02:33:26Z)
Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models [2.7174461714624805]
We introduce Obliviate, a lightweight method that surgically suppresses exact reproduction of specified sequences.<n>Obliviate first identifies memorized passages and then, for each target token, minimally adjusts the model's output distribution.<n>We evaluate Obliviate on four popular 6-8B- parameter models (LLaMA-3.1, LLaMA-3.1-Instruct, Qwen-2.5, and Yi-1.5) using synthetic benchmarks and organic copyrighted excerpts.
arXiv Detail & Related papers (2025-02-20T20:02:56Z)
WET: Overcoming Paraphrasing Vulnerabilities in Embeddings-as-a-Service with Linear Transformation Watermarks [28.992750031041744]
We show that existing E watermarks can be removed by paraphrasing when attackers clone the model. We propose a novel watermarking technique that involves linearly transforming the embeddings.
arXiv Detail & Related papers (2024-08-29T18:59:56Z)
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data? [62.72729485995075]
We investigate the effectiveness of watermarking as a deterrent against the generation of copyrighted texts. We find that watermarking adversely affects the success rate of Membership Inference Attacks (MIAs) We propose an adaptive technique to improve the success rate of a recent MIA under watermarking.
arXiv Detail & Related papers (2024-07-24T16:53:09Z)
Large Language Models as Carriers of Hidden Messages [0.0]
Simple fine-tuning can embed hidden text into large language models (LLMs), which is revealed only when triggered by a specific query. Our work demonstrates that embedding hidden text via fine-tuning, although seemingly secure due to the vast number of potential triggers, is vulnerable to extraction. We introduce an extraction attack called Unconditional Token Forcing (UTF), which iteratively feeds tokens from the LLM's vocabulary to reveal sequences with high token probabilities, indicating hidden text candidates.
arXiv Detail & Related papers (2024-06-04T16:49:06Z)
Large Language Model Watermark Stealing With Mixed Integer Programming [51.336009662771396]
Large Language Model (LLM) watermark shows promise in addressing copyright, monitoring AI-generated text, and preventing its misuse. Recent research indicates that watermarking methods using numerous keys are susceptible to removal attacks. We propose a novel green list stealing attack against the state-of-the-art LLM watermark scheme.
arXiv Detail & Related papers (2024-05-30T04:11:17Z)
Silent Guardian: Protecting Text from Malicious Exploitation by Large Language Models [63.91178922306669]
We introduce Silent Guardian, a text protection mechanism against large language models (LLMs) By carefully modifying the text to be protected, TPE can induce LLMs to first sample the end token, thus directly terminating the interaction. We show that SG can effectively protect the target text under various configurations and achieve almost 100% protection success rate in some cases.
arXiv Detail & Related papers (2023-12-15T10:30:36Z)
A Robust Semantics-based Watermark for Large Language Model against Paraphrasing [50.84892876636013]
Large language models (LLMs) have show great ability in various natural language tasks. There are concerns that LLMs are possible to be used improperly or even illegally. We propose a semantics-based watermark framework SemaMark.
arXiv Detail & Related papers (2023-11-15T06:19:02Z)
Watermarking Vision-Language Pre-trained Models for Multi-modal Embedding as a Service [19.916419258812077]
We propose a robust embedding watermarking method for languages called Marker. To enhance the watermark, we propose a collaborative copyright verification strategy based on both backdoor trigger and embedding distribution.
arXiv Detail & Related papers (2023-11-10T04:27:27Z)
Towards Codable Watermarking for Injecting Multi-bits Information to LLMs [86.86436777626959]
Large language models (LLMs) generate texts with increasing fluency and realism. Existing watermarking methods are encoding-inefficient and cannot flexibly meet the diverse information encoding needs. We propose Codable Text Watermarking for LLMs (CTWL) that allows text watermarks to carry multi-bit customizable information.
arXiv Detail & Related papers (2023-07-29T14:11:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.