Related papers: Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models

Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models

URL: http://arxiv.org/abs/2502.15010v2
Date: Thu, 12 Jun 2025 10:12:38 GMT
Title: Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models
Authors: Mark Russinovich, Ahmed Salem,
Abstract summary: We introduce Obliviate, a lightweight method that surgically suppresses exact reproduction of specified sequences.<n>Obliviate first identifies memorized passages and then, for each target token, minimally adjusts the model's output distribution.<n>We evaluate Obliviate on four popular 6-8B- parameter models (LLaMA-3.1, LLaMA-3.1-Instruct, Qwen-2.5, and Yi-1.5) using synthetic benchmarks and organic copyrighted excerpts.
Score: 2.7174461714624805
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent copyright agreements between AI companies and content creators underscore the need for fine-grained control over language models' ability to reproduce copyrighted text. Existing defenses-ranging from aggressive unlearning to simplistic output filters-either sacrifice model utility or inadequately address verbatim leakage. We introduce Obliviate, a lightweight post-training method that surgically suppresses exact reproduction of specified sequences while preserving semantic understanding. Obliviate first identifies memorized passages and then, for each target token, minimally adjusts the model's output distribution via a Kullback-Leibler divergence penalty to drive down the probability of exact reproduction. Simultaneously, we enforce a consistency loss on non-target tokens to retain the model's fluency and task performance. We evaluate Obliviate on four popular 6-8B-parameter models (LLaMA-3.1, LLaMA-3.1-Instruct, Qwen-2.5, and Yi-1.5) using synthetic memorization benchmarks and organic copyrighted excerpts (e.g., Moby Dick, Frankenstein, Alice in Wonderland and Les Miserables). Across all settings, Obliviate reduces verbatim recall by two orders of magnitude (e.g., from hundreds of words to fewer than 12) while degrading downstream accuracy by at most 1% on HellaSwag, MMLU, TruthfulQA, and Winogrande. Furthermore, we benchmark Obliviate aganist different unlearning and copyright techniques using the MUSE and CoTaEval benchmarks. These results position Obliviate as a practical, high-fidelity solution for copyright compliance in deployed LLMs.

Related papers

Towards Copyright Protection for Knowledge Bases of Retrieval-augmented Language Models via Reasoning [58.57194301645823]
Large language models (LLMs) are increasingly integrated into real-world personalized applications.<n>The valuable and often proprietary nature of the knowledge bases used in RAG introduces the risk of unauthorized usage by adversaries.<n>Existing methods that can be generalized as watermarking techniques to protect these knowledge bases typically involve poisoning or backdoor attacks.<n>We propose name for harmless' copyright protection of knowledge bases.
arXiv Detail & Related papers (2025-02-10T09:15:56Z)
Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies [10.971976066073442]
Speculative decoding (SD) methods offer substantial efficiency gains by generating multiple tokens using a single target forward pass.<n>Existing SD approaches require the drafter and target models to share the same vocabulary, thus limiting the pool of possible drafters.<n>We present three new SD methods that remove this shared-vocabulary constraint.
arXiv Detail & Related papers (2025-01-31T19:13:58Z)
Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection [49.15148871877941]
Next-token distribution outputs offer a theoretically appealing approach for detection of large language models (LLMs)<n>We propose the Perplexity Attention Weighted Network (PAWN), which uses the last hidden states of the LLM and positions to weight the sum of a series of features based on metrics from the next-token distribution across the sequence length.<n>PAWN shows competitive and even better performance in-distribution than the strongest baselines with a fraction of their trainable parameters.
arXiv Detail & Related papers (2025-01-07T17:00:49Z)
Investigating the Feasibility of Mitigating Potential Copyright Infringement via Large Language Model Unlearning [0.0]
Pre-trained Large Language Models (LLMs) have demonstrated remarkable capabilities but also pose risks by learning and generating copyrighted material. We propose Stable Sequential Unlearning (SSU), a novel framework designed to unlearn copyrighted content from LLMs over multiple time steps. SSU sometimes achieves an effective trade-off between unlearning efficacy and general-purpose language abilities, outperforming existing baselines, but it's not a cure-all for unlearning copyrighted material.
arXiv Detail & Related papers (2024-12-16T20:01:06Z)
A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution [57.309390098903]
Authorship attribution aims to identify the origin or author of a document. Large Language Models (LLMs) with their deep reasoning capabilities and ability to maintain long-range textual associations offer a promising alternative. Our results on the IMDb and blog datasets show an impressive 85% accuracy in one-shot authorship classification across ten authors.
arXiv Detail & Related papers (2024-10-29T04:14:23Z)
Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles [23.134664392314264]
Tokenization is associated with many poorly understood shortcomings in language models (LMs) This work studies how tokenization impacts model performance by analyzing and comparing models with their byte-level counterparts. We introduce the Byte-Token Representation Lemma, a framework that establishes a mapping between the learned token distribution and its equivalent byte-level distribution.
arXiv Detail & Related papers (2024-10-11T23:30:42Z)
Autoregressive Speech Synthesis without Vector Quantization [135.4776759536272]
We present MELLE, a novel continuous-valued token based language modeling approach for text-to-speech synthesis (TTS)<n>MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition.<n>MELLE mitigates robustness issues by avoiding the inherent flaws of sampling vector-quantized codes.
arXiv Detail & Related papers (2024-07-11T14:36:53Z)
CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation [132.00910067533982]
We introduce CopyBench, a benchmark designed to measure both literal and non-literal copying in LM generations. We find that, although literal copying is relatively rare, two types of non-literal copying -- event copying and character copying -- occur even in models as small as 7B parameters.
arXiv Detail & Related papers (2024-07-09T17:58:18Z)
Avoiding Copyright Infringement via Large Language Model Unlearning [24.050754626661124]
We propose a novel framework designed to unlearn copyrighted content from Large Language Models over multiple time steps.<n>We improve unlearning efficacy by introducing random labeling loss and ensuring the model retains its general-purpose knowledge.<n> Experimental results show that SSU achieves an effective trade-off between unlearning efficacy and general-purpose language abilities.
arXiv Detail & Related papers (2024-06-16T14:12:37Z)
FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping [49.66872823080736]
Autoregressive Large Language Models (e.g., LLaMa, GPTs) are omnipresent achieving remarkable success in language understanding and generation. To mitigate overload incurred during generation, several early-exit and layer-dropping strategies have been proposed. We propose FFN-SkipLLM, which is an input-adaptive feed-forward skipping strategy.
arXiv Detail & Related papers (2024-04-05T02:35:43Z)
Tuning-Free Accountable Intervention for LLM Deployment -- A Metacognitive Approach [55.613461060997004]
Large Language Models (LLMs) have catalyzed transformative advances across a spectrum of natural language processing tasks. We propose an innovative textitmetacognitive approach, dubbed textbfCLEAR, to equip LLMs with capabilities for self-aware error identification and correction.
arXiv Detail & Related papers (2024-03-08T19:18:53Z)
JAMDEC: Unsupervised Authorship Obfuscation using Constrained Decoding over Small Language Models [53.83273575102087]
We propose an unsupervised inference-time approach to authorship obfuscation. We introduce JAMDEC, a user-controlled, inference-time algorithm for authorship obfuscation. Our approach builds on small language models such as GPT2-XL in order to help avoid disclosing the original content to proprietary LLM's APIs.
arXiv Detail & Related papers (2024-02-13T19:54:29Z)
An Analysis and Mitigation of the Reversal Curse [70.13419502543915]
Recent research observed a noteworthy phenomenon in large language models (LLMs) The reversal curse is that when dealing with two entities, $a$ and $b$, LLMs excel in handling sequences in the form of $aRb$,'' but encounter challenges when processing $bR-1a$''
arXiv Detail & Related papers (2023-11-13T17:01:12Z)
Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark [58.60940048748815]
Companies have begun to offer Embedding as a Service (E) based on large language models (LLMs) E is vulnerable to model extraction attacks, which can cause significant losses for the owners of LLMs. We propose an Embedding Watermark method called EmbMarker that implants backdoors on embeddings.
arXiv Detail & Related papers (2023-05-17T08:28:54Z)
Bayesian Prompt Learning for Image-Language Model Generalization [64.50204877434878]
We use the regularization ability of Bayesian methods to frame prompt learning as a variational inference problem. Our approach regularizes the prompt space, reduces overfitting to the seen prompts and improves the prompt generalization on unseen prompts. We demonstrate empirically on 15 benchmarks that Bayesian prompt learning provides an appropriate coverage of the prompt space.
arXiv Detail & Related papers (2022-10-05T17:05:56Z)
Lexically-constrained Text Generation through Commonsense Knowledge Extraction and Injection [62.071938098215085]
We focus on the Commongen benchmark, wherein the aim is to generate a plausible sentence for a given set of input concepts. We propose strategies for enhancing the semantic correctness of the generated text.
arXiv Detail & Related papers (2020-12-19T23:23:40Z)
Improve Variational Autoencoder for Text Generationwith Discrete Latent Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning. VAEs tend to ignore latent variables with a strong auto-regressive decoder. We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.