Related papers: Copyright Traps for Large Language Models

Copyright Traps for Large Language Models

URL: http://arxiv.org/abs/2402.09363v2
Date: Tue, 4 Jun 2024 21:07:11 GMT
Title: Copyright Traps for Large Language Models
Authors: Matthieu Meeus, Igor Shilov, Manuel Faysse, Yves-Alexandre de Montjoye,
Abstract summary: We propose to use copyright traps to detect the use of copyrighted content in large language models. We train a 1.3B model from scratch and insert traps into original content (books) We show, contrary to intuition, that even medium-length trap sentences repeated a significant number of times (100) are not detectable using existing methods.
Score: 6.902279764206365
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Questions of fair use of copyright-protected content to train Large Language Models (LLMs) are being actively debated. Document-level inference has been proposed as a new task: inferring from black-box access to the trained model whether a piece of content has been seen during training. SOTA methods however rely on naturally occurring memorization of (part of) the content. While very effective against models that memorize significantly, we hypothesize--and later confirm--that they will not work against models that do not naturally memorize, e.g. medium-size 1B models. We here propose to use copyright traps, the inclusion of fictitious entries in original content, to detect the use of copyrighted materials in LLMs with a focus on models where memorization does not naturally occur. We carefully design a randomized controlled experimental setup, inserting traps into original content (books) and train a 1.3B LLM from scratch. We first validate that the use of content in our target model would be undetectable using existing methods. We then show, contrary to intuition, that even medium-length trap sentences repeated a significant number of times (100) are not detectable using existing methods. However, we show that longer sequences repeated a large number of times can be reliably detected (AUC=0.75) and used as copyright traps. Beyond copyright applications, our findings contribute to the study of LLM memorization: the randomized controlled setup enables us to draw causal relationships between memorization and certain sequence properties such as repetition in model training data and perplexity.

Related papers

Investigating the Feasibility of Mitigating Potential Copyright Infringement via Large Language Model Unlearning [0.0]
Pre-trained Large Language Models (LLMs) have demonstrated remarkable capabilities but also pose risks by learning and generating copyrighted material. We propose Stable Sequential Unlearning (SSU), a novel framework designed to unlearn copyrighted content from LLMs over multiple time steps. SSU sometimes achieves an effective trade-off between unlearning efficacy and general-purpose language abilities, outperforming existing baselines, but it's not a cure-all for unlearning copyrighted material.
arXiv Detail & Related papers (2024-12-16T20:01:06Z)
Demystifying Verbatim Memorization in Large Language Models [67.49068128909349]
Large Language Models (LLMs) frequently memorize long sequences verbatim, often with serious legal and privacy implications. We develop a framework to study verbatim memorization in a controlled setting by continuing pre-training from Pythia checkpoints with injected sequences. We find that (1) non-trivial amounts of repetition are necessary for verbatim memorization to happen; (2) later (and presumably better) checkpoints are more likely to memorize verbatim sequences, even for out-of-distribution sequences.
arXiv Detail & Related papers (2024-07-25T07:10:31Z)
Avoiding Copyright Infringement via Large Language Model Unlearning [24.050754626661124]
We propose a novel framework designed to unlearn copyrighted content from Large Language Models over multiple time steps. We improve unlearning efficacy by introducing random labeling loss and ensuring the model retains its general-purpose knowledge. Experimental results show that SSU achieves an effective trade-off between unlearning efficacy and general-purpose language abilities.
arXiv Detail & Related papers (2024-06-16T14:12:37Z)
Memorized Images in Diffusion Models share a Subspace that can be Located and Deleted [15.162296378581853]
Large-scale text-to-image diffusion models excel in generating high-quality images from textual inputs. Concerns arise as research indicates their tendency to memorize and replicate training data. Efforts within the text-to-image community to address memorization explore causes such as data duplication, replicated captions, or trigger tokens.
arXiv Detail & Related papers (2024-06-01T15:47:13Z)
Mosaic Memory: Fuzzy Duplication in Copyright Traps for Large Language Models [7.405082919188384]
Copyright traps have been proposed to be injected into the original content, improving content detectability in newly released LLMs. Traps rely on the exact duplication of a unique text sequence, leaving them vulnerable to commonly deployed data deduplication techniques. We propose the generation of fuzzy copyright traps, featuring slight modifications across duplication.
arXiv Detail & Related papers (2024-05-24T13:05:05Z)
Rethinking LLM Memorization through the Lens of Adversarial Compression [93.13830893086681]
Large language models (LLMs) trained on web-scale datasets raise substantial concerns regarding permissible data usage. One major question is whether these models "memorize" all their training data or they integrate many data sources in some way more akin to how a human would learn and synthesize information. We propose the Adversarial Compression Ratio (ACR) as a metric for assessing memorization in LLMs.
arXiv Detail & Related papers (2024-04-23T15:49:37Z)
DE-COP: Detecting Copyrighted Content in Language Models Training Data [24.15936677068714]
We propose DE-COP, a method to determine whether a piece of copyrighted content was included in training. We construct BookTection, a benchmark with excerpts from 165 books published prior and subsequent to a model's training cutoff. Experiments show that DE-COP surpasses the prior best method by 9.6% in detection performance.
arXiv Detail & Related papers (2024-02-15T12:17:15Z)
SoK: Memorization in General-Purpose Large Language Models [25.448127387943053]
Large Language Models (LLMs) are advancing at a remarkable pace, with myriad applications under development. LLMs can memorize short secrets in the training data, but can also memorize concepts like facts or writing styles that can be expressed in text in many different ways. We propose a taxonomy for memorization in LLMs that covers verbatim text, facts, ideas and algorithms, writing styles, distributional properties, and alignment goals.
arXiv Detail & Related papers (2023-10-24T14:25:53Z)
Are You Stealing My Model? Sample Correlation for Fingerprinting Deep Neural Networks [86.55317144826179]
Previous methods always leverage the transferable adversarial examples as the model fingerprint. We propose a novel yet simple model stealing detection method based on SAmple Correlation (SAC) SAC successfully defends against various model stealing attacks, even including adversarial training or transfer learning.
arXiv Detail & Related papers (2022-10-21T02:07:50Z)
MOVE: Effective and Harmless Ownership Verification via Embedded External Features [109.19238806106426]
We propose an effective and harmless model ownership verification (MOVE) to defend against different types of model stealing simultaneously. We conduct the ownership verification by verifying whether a suspicious model contains the knowledge of defender-specified external features. In particular, we develop our MOVE method under both white-box and black-box settings to provide comprehensive model protection.
arXiv Detail & Related papers (2022-08-04T02:22:29Z)
Quantifying Memorization Across Neural Language Models [61.58529162310382]
Large language models (LMs) have been shown to memorize parts of their training data, and when prompted appropriately, they will emit the memorized data verbatim. This is undesirable because memorization violates privacy (exposing user data), degrades utility (repeated easy-to-memorize text is often low quality), and hurts fairness (some texts are memorized over others). We describe three log-linear relationships that quantify the degree to which LMs emit memorized training data.
arXiv Detail & Related papers (2022-02-15T18:48:31Z)
Defending against Model Stealing via Verifying Embedded External Features [90.29429679125508]
adversaries can steal' deployed models even when they have no training samples and can not get access to the model parameters or structures. We explore the defense from another angle by verifying whether a suspicious model contains the knowledge of defender-specified emphexternal features. Our method is effective in detecting different types of model stealing simultaneously, even if the stolen model is obtained via a multi-stage stealing process.
arXiv Detail & Related papers (2021-12-07T03:51:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.